Generative AI Development Disclosure
I. Our Generative AI Services
This Generative AI Development Disclosure describes the data Amazon Web Services, Inc. and its affiliates (collectively, “AWS” or “we”) use to develop or deploy our generative AI models and services (“generative AI services” or “services”). We develop AI services to help customers accelerate innovation and transform their businesses. Our generative AI services provide a range of capabilities from natural language processing to software development and business intelligence tools. Our generative AI services may be powered by foundation models, which include models built by Amazon as well as by third parties. We may use multiple models, and we may select models to optimize performance, select the best model for the relevant task, and incorporate the latest capabilities.
II. Responsible Training Data Practices
For the generative AI services we make available to our customers, we train and test on a range of data intended to enhance our services’ capabilities. We may train and test on licensed and proprietary datasets, synthetic datasets, open-source datasets, and publicly available content (including web crawled data). These datasets may include text, images, audio, video, code, and other types of data relevant to the service’s purpose. These datasets may contain public domain content, rights-protected material, and in some cases, personal information or aggregate consumer information. We train and select the models that power our generative AI services to help deliver more accurate, helpful, and relevant responses, and to help support the features and functionality of the service, such as by responding to natural language queries, recognizing visual content, generating relevant recommendations, or creating useful content.
We use various techniques to curate training data, which may include human and automated annotation, automated quality indicators, preference ranking, and other methods. We also implement multiple safeguards throughout our training data practices, including techniques to help limit the impact of any processing of personal information in connection with training generative AI services. For example, we may use processes like training data deduplication to remove repetitive data that could cause models to overweight certain patterns or reproduce specific content.
The size of our training and testing data varies by model or service, and could range from thousands to trillions of data points. We have been collecting data since before 2022, with different models beginning development at different times. Data collection, training, and testing are ongoing processes as we continuously improve our services and incorporate new capabilities.
III. Evaluation for Quality
We test and evaluate our generative AI services to assess that they meet our quality standards and perform as intended. We assess performance, accuracy, and reliability for our services’ intended uses. Our methodologies may include automated and human evaluation, benchmarking against established industry standards, simulating real-world usage patterns and edge cases, and evaluating outputs across various conditions and contexts. We test using data and modalities relevant to the goals and functionality of the generative AI service. We evaluate generative AI service performance through various methods, such as monitoring metrics, incorporating user feedback, and conducting periodic assessments as appropriate for the service.
For example, for frontier models like Amazon Nova Premier, we conducted comprehensive safety evaluations including expert red teaming across critical risk domains such as Chemical, Biological, Radiological & Nuclear (CBRN) capabilities, offensive cyber operations, and automated AI research and development, engaging both internal experts and independent third-party evaluators to identify potential risks before deployment.
As part of our quality evaluation process, we also implement appropriate safeguards for our generative AI services. These safeguards may include output filtering or safety controls designed to enable our generative AI services to provide trustworthy responses. Our approaches are tailored to each service’s purpose and capabilities.
IV. Learn More
We are committed to building AI responsibly, with appropriate safeguards for safety, accuracy, privacy, and security. For more information about our approach to responsible AI, see our Responsible AI at Amazon page. For more information about specific generative AI services and features, please see the applicable documentation.
For information about how AWS collects and uses personal information collected in relation to AWS offerings, please see the AWS Privacy Notice. For information about how AWS handles content processed by AWS services, please see our AWS Customer Agreement and the applicable AWS Service Terms.
We are also working to advance responsible AI and foster innovation that balances progress with responsibility. This includes ongoing investment in AI safety research, participation in industry standards development, and collaboration with industry partners, governments, academic institutions, and safety organizations to advance the field of responsible AI.