Listing Thumbnail

    Synthetic Data Generator

     Info
    Deployed on AWS
    AWS Free Tier
    Synthetic Data Generator (SDG) creates realistic, privacy-safe test datasets from Oracle, MySQL, PostgreSQL, SQL Server, and SFTP sources. Auto-detects PII, preserves statistical shape and referential integrity, runs fully inside your own AWS account.

    Overview

    Synthetic Data Generator (SDG) is a self-hosted web application that produces realistic, privacy-safe synthetic datasets from your own production data, without moving that data outside your AWS account. Point SDG at an Oracle, MySQL, PostgreSQL, SQL Server, or SFTP source; it profiles the real data, auto-detects columns containing PII (names, emails, phones, SSNs, and custom patterns you define), and generates synthetic output that preserves statistical distributions, data types, and key relationships.

    Key capabilities

    1. Broad source coverage

    Oracle, MySQL, PostgreSQL, SQL Server, and SFTP (.csv, .xml, .json, .dat). Configure a connection once in the UI; SDG introspects schemas, tables, and columns automatically.

    2. Automated PII detection

    Microsoft Presidio engine plus extensible custom rules. Catches the usual identifiers (names, emails, phones, SSNs, credit cards) out of the box, and lets your team add domain-specific patterns such as policy numbers, account IDs, and tax IDs so nothing sensitive slips through untagged.

    3. Faithful synthesis

    Preserves column distributions, column-pair correlations, and primary/foreign-key relationships. Downstream tests behave the way they would against production data, without any of the exposure.

    • Runs in your account (deployed from a single AMI into your VPC; data never leaves your AWS boundary and no third-party service sees it).
    • Web UI (Angular frontend for configuring connections, browsing schemas, selecting tables, reviewing detected PII, and triggering generation. Generated data can be written back to a target database or downloaded as files).

    Who it's for

    Data, QA, and platform teams who need production-realistic datasets for development, testing, training, or demos but can't legally or safely use real customer data. Especially useful for regulated industries (finance, insurance, healthcare) where data-sharing boundaries and PII handling are audited.

    What you get

    A hardened Ubuntu 24.04 AMI with SDG preinstalled (gunicorn + nginx), CloudWatch-ready logging, and a built-in 7-day trial license. Full licenses are issued through PDI's automated license workflow: the customer submits a short registration form, PDI's sales team is notified via Microsoft Teams, and a 30-day license key is provided.

    Highlights

    • Oracle, MySQL, PostgreSQL, SQL Server, and SFTP (.csv, .xml, .json, .dat). Configure a connection once in the UI; SDG introspects schemas, tables, and columns automatically.
    • Microsoft Presidio engine plus extensible custom rules. Catches the usual identifiers (names, emails, phones, SSNs, credit cards) out of the box, and lets your team add domain-specific patterns such as policy numbers, account IDs, and tax IDs so nothing sensitive slips through untagged.
    • Preserves column distributions, column-pair correlations, and primary/foreign-key relationships. Downstream tests behave the way they would against production data, without any of the exposure.

    Details

    Delivery method

    Delivery option
    64-bit (x86) Amazon Machine Image (AMI)

    Latest version

    Operating system
    Ubuntu 24.04

    Deployed on AWS
    New

    Introducing multi-product solutions

    You can now purchase comprehensive solutions tailored to use cases and industries.

    Multi-product solutions

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Synthetic Data Generator

     Info
    Pricing and entitlements for this product are managed through an external billing relationship between you and the vendor. You activate the product by supplying a license purchased outside of AWS Marketplace, while AWS provides the infrastructure required to launch the product. AWS Subscriptions have no end date and may be canceled any time. However, the cancellation won't affect the status of the external license.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Vendor refund policy

    Refund Policy

    If you are not satisfied with Synthetic Data Generator, contact Pacific Data Integrators to request a refund. Requests are evaluated case-by-case per our standard terms and your AWS Marketplace subscription.

    AWS infrastructure charges (EC2, EBS, data transfer) are billed by AWS and are not refundable by PDI.

    Email support@pacificdataintegrators.com  with your AWS Marketplace subscription ID, purchase date, and reason. We respond within a few business days.

    How can we make this page better?

    Tell us how we can improve this page, or report an issue with this product.
    Tell us how we can improve this page, or report an issue with this product.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    64-bit (x86) Amazon Machine Image (AMI)

    Amazon Machine Image (AMI)

    An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.

    Version release notes

    SDG 2026.1 expands database coverage beyond Oracle, replaces the email-based license workflow with a Microsoft Teams alert so PDI's sales team can respond within hours instead of days, and resolves several build-time issues that prevented earlier versions from booting on freshly baked images.

    What's new

    • Database support for MySQL, PostgreSQL, and SQL Server alongside the existing Oracle and SFTP sources.
    • License registration now notifies PDI's sales team through Microsoft Teams with product, cloud, and customer details, replacing the email workflow.
    • Updated landing-page UX after license registration with a clear "Request received" confirmation and auto-redirect.
    • Built on Ubuntu 24.04 with Python 3.12, gunicorn 23, and the latest security patches from upstream Canonical.

    Fixes

    • Gunicorn now boots cleanly: the compat_data_describer and compat_data_generator modules are properly compiled and bundled into the image.
    • Removed a dead pyspark import path that was preventing synthetic_data_generator from loading at runtime.
    • Fresh built-in 7-day trial license; activation reliably reaches the license service.

    Upgrade notes This is a new AMI version; there is no in-place upgrade. Existing customers on earlier versions should launch a fresh instance from 2026.1, re-create their database connections, and continue using the same license key. Licenses issued against earlier AMI versions remain valid until expiration.

    Additional details

    Usage instructions

    1. Launch an EC2 instance from the AMI

    Choose an instance type based on your workload:

    • t3.small: trial / evaluation (minimum)
    • t3.medium or m6i.large: standard production
    • m6i.xlarge or r6i.large: large tables (over 1M rows) or many PII columns

    Accept the default 30 GB root volume for light use; bump to 100-200 GB if you will be generating large synthetic CSV outputs to disk before download.

    2. Configure the security group

    Allow inbound traffic from the IPs that will use SDG:

    PortProtocolPurpose
    8082TCPSDG web UI and API (required)
    22TCPSSH for administration (optional; restrict to admin IPs only)

    Outbound: allow 443/TCP to reach the PDI license service and AWS APIs.

    3. Open the web UI

    Browse to http://INSTANCE_PUBLIC_IP:8082/. The landing page loads a 7-day trial license automatically - no sign-up required to evaluate. You can start generating synthetic data immediately.

    4. Request a full license

    From the landing page (or the License Expired modal after the trial ends), click Request License. Fill in:

    • Client Name
    • Email Address
    • State, Country
    • Phone (optional)

    Submit. PDI's sales team is notified in real time and will reach out to you with your 30-day license key. Paste the key into the Upload License screen to activate.

    5. Connect a data source

    Navigate to Data Connections, then Add Connection. SDG supports:

    • Oracle
    • MySQL
    • PostgreSQL
    • SQL Server
    • SFTP (.csv, .xml, .json, .dat files)

    Enter host, port, credentials, and (for databases) schema. Click Test Connection to verify before saving.

    6. Generate synthetic data

    1. Navigate to Synthetic Data Generation.
    2. Pick a saved connection.
    3. Select a schema and table.
    4. Review the PII columns SDG auto-detected (you can add or remove columns manually).
    5. Choose how many synthetic rows to generate.
    6. Run the job. When it completes, download the synthetic dataset as CSV or write it back to a target database.

    7. Logs and troubleshooting

    SSH into the instance (as ubuntu) and check:

    Support

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 reviews
    No customer reviews yet
    Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.