December 2025
AWS Clean Rooms has just rolled out a new feature called privacy-enhancing synthetic dataset generation that makes it possible to train machine learning models using shared data but without exposing any real personal or sensitive data. The launch is positioned as a strategic step toward scalable, privacy-preserving artificial intelligence deployment across industries. It also signals a broader trend toward safer data collaboration.

ML model training has always required many organizations to choose between two undesirable options: either protect user privacy and risk hamstrung insights, or use rich, detailed data to obtain accurate models. With this new synthetic data option, in Clean Rooms that trade off can finally be resolved. Companies create a completely new synthetic dataset that mimics the statistical patterns of the original data but lacks actual individual-level records, rather than feeding real user records into ML pipelines. This allows partners without legal risks. It also speeds up model development by removing lengthy privacy approval workflows.
Under the hood, Clean Rooms uses advanced algorithms to learn the underlying distribution and relationship in the original datasets and then samples new data points with the same overall structure. For tasks like classification or regression, this means ML models trained on synthetic data behave nearly as if they have seen the real dataset without running the risk of violating privacy. The approach maintains high model performance while removing identifiable data. It also supports enterprise-grade data pipelines without sacrificing accuracy.
What's more, the service gives data owners full control over privacy parameters. You can configure thresholds for how much noise should be added and how resistant the synthetic data should be against membership inference attacks. After generation, clean rooms also provide metrics, a fidelity score, and a privacy score. These metrics give teams transparency and confidence when using synthetic data. They also help organizations balance performance and privacy based on business needs.
This launch, scenarios previously blocked by privacy concerns, like collaborating across organizations, combining disparate customer datasets, building models for ad targeting, or fraud detection, now become feasible. Organizations can unlock the power of combined data while respecting privacy and compliance norms. The feature enables new business models based on data sharing and co-created AI. It also positions AWS as a leader in privacy-centric machine learning.
December 2025
December 2025
December 2025
December 2025