Overview
Synthetic Data Generator (SDG) is a self-hosted web application that produces realistic, privacy-safe synthetic datasets from your own production data, without moving that data outside your AWS account. Point SDG at an Oracle, MySQL, PostgreSQL, SQL Server, or SFTP source; it profiles the real data, auto-detects columns containing PII (names, emails, phones, SSNs, and custom patterns you define), and generates synthetic output that preserves statistical distributions, data types, and key relationships.
Key capabilities
1. Broad source coverage
Oracle, MySQL, PostgreSQL, SQL Server, and SFTP (.csv, .xml, .json, .dat). Configure a connection once in the UI; SDG introspects schemas, tables, and columns automatically.
2. Automated PII detection
Microsoft Presidio engine plus extensible custom rules. Catches the usual identifiers (names, emails, phones, SSNs, credit cards) out of the box, and lets your team add domain-specific patterns such as policy numbers, account IDs, and tax IDs so nothing sensitive slips through untagged.
3. Faithful synthesis
Preserves column distributions, column-pair correlations, and primary/foreign-key relationships. Downstream tests behave the way they would against production data, without any of the exposure.
- Runs in your account (deployed from a single AMI into your VPC; data never leaves your AWS boundary and no third-party service sees it).
- Web UI (Angular frontend for configuring connections, browsing schemas, selecting tables, reviewing detected PII, and triggering generation. Generated data can be written back to a target database or downloaded as files).
Who it's for
Data, QA, and platform teams who need production-realistic datasets for development, testing, training, or demos but can't legally or safely use real customer data. Especially useful for regulated industries (finance, insurance, healthcare) where data-sharing boundaries and PII handling are audited.
What you get
A hardened Ubuntu 24.04 AMI with SDG preinstalled (gunicorn + nginx), CloudWatch-ready logging, and a built-in 7-day trial license. Full licenses are issued through PDI's automated license workflow: the customer submits a short registration form, PDI's sales team is notified via Microsoft Teams, and a 30-day license key is provided.
Highlights
- Oracle, MySQL, PostgreSQL, SQL Server, and SFTP (.csv, .xml, .json, .dat). Configure a connection once in the UI; SDG introspects schemas, tables, and columns automatically.
- Microsoft Presidio engine plus extensible custom rules. Catches the usual identifiers (names, emails, phones, SSNs, credit cards) out of the box, and lets your team add domain-specific patterns such as policy numbers, account IDs, and tax IDs so nothing sensitive slips through untagged.
- Preserves column distributions, column-pair correlations, and primary/foreign-key relationships. Downstream tests behave the way they would against production data, without any of the exposure.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Vendor refund policy
Refund Policy
If you are not satisfied with Synthetic Data Generator, contact Pacific Data Integrators to request a refund. Requests are evaluated case-by-case per our standard terms and your AWS Marketplace subscription.
AWS infrastructure charges (EC2, EBS, data transfer) are billed by AWS and are not refundable by PDI.
Email support@pacificdataintegrators.com with your AWS Marketplace subscription ID, purchase date, and reason. We respond within a few business days.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
SDG 2026.1 expands database coverage beyond Oracle, replaces the email-based license workflow with a Microsoft Teams alert so PDI's sales team can respond within hours instead of days, and resolves several build-time issues that prevented earlier versions from booting on freshly baked images.
What's new
- Database support for MySQL, PostgreSQL, and SQL Server alongside the existing Oracle and SFTP sources.
- License registration now notifies PDI's sales team through Microsoft Teams with product, cloud, and customer details, replacing the email workflow.
- Updated landing-page UX after license registration with a clear "Request received" confirmation and auto-redirect.
- Built on Ubuntu 24.04 with Python 3.12, gunicorn 23, and the latest security patches from upstream Canonical.
Fixes
- Gunicorn now boots cleanly: the compat_data_describer and compat_data_generator modules are properly compiled and bundled into the image.
- Removed a dead pyspark import path that was preventing synthetic_data_generator from loading at runtime.
- Fresh built-in 7-day trial license; activation reliably reaches the license service.
Upgrade notes This is a new AMI version; there is no in-place upgrade. Existing customers on earlier versions should launch a fresh instance from 2026.1, re-create their database connections, and continue using the same license key. Licenses issued against earlier AMI versions remain valid until expiration.
Additional details
Usage instructions
1. Launch an EC2 instance from the AMI
Choose an instance type based on your workload:
- t3.small: trial / evaluation (minimum)
- t3.medium or m6i.large: standard production
- m6i.xlarge or r6i.large: large tables (over 1M rows) or many PII columns
Accept the default 30 GB root volume for light use; bump to 100-200 GB if you will be generating large synthetic CSV outputs to disk before download.
2. Configure the security group
Allow inbound traffic from the IPs that will use SDG:
| Port | Protocol | Purpose |
|---|---|---|
| 8082 | TCP | SDG web UI and API (required) |
| 22 | TCP | SSH for administration (optional; restrict to admin IPs only) |
Outbound: allow 443/TCP to reach the PDI license service and AWS APIs.
3. Open the web UI
Browse to http://INSTANCE_PUBLIC_IP:8082/. The landing page loads a 7-day trial license automatically - no sign-up required to evaluate. You can start generating synthetic data immediately.
4. Request a full license
From the landing page (or the License Expired modal after the trial ends), click Request License. Fill in:
- Client Name
- Email Address
- State, Country
- Phone (optional)
Submit. PDI's sales team is notified in real time and will reach out to you with your 30-day license key. Paste the key into the Upload License screen to activate.
5. Connect a data source
Navigate to Data Connections, then Add Connection. SDG supports:
- Oracle
- MySQL
- PostgreSQL
- SQL Server
- SFTP (.csv, .xml, .json, .dat files)
Enter host, port, credentials, and (for databases) schema. Click Test Connection to verify before saving.
6. Generate synthetic data
- Navigate to Synthetic Data Generation.
- Pick a saved connection.
- Select a schema and table.
- Review the PII columns SDG auto-detected (you can add or remove columns manually).
- Choose how many synthetic rows to generate.
- Run the job. When it completes, download the synthetic dataset as CSV or write it back to a target database.
7. Logs and troubleshooting
SSH into the instance (as ubuntu) and check:
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products



