AWS Cloud Enterprise Strategy Blog
Fuel Your Data with Generative AI
Data is the fuel for generative AI. Vast amounts of data and the cloud’s crucial ability to store and process data at that scale drove the rapid rise of powerful foundation models. You will be able to fine-tune those models or to use retrieval augmented generation (RAG) to tailor them to your business—if you can corrall your enterpirse’s scattered data and make it available.
But the relationship between data and AI goes both ways— AI can also be used to improve and enhance your data and make it available for analysis.
While companies have invested heavily in data over the last few years, they often find that it hasn’t been enough; the rise of AI has drawn their attention to gaps in their data and difficulties accessing or interpreting it. Data may be isolated in organizational silos. Its quality may be poor or it may be incomplete. It may be difficult to work with or to interpret.
Below, I’ll give three examples where you can use AI to fuel your data, rather than vice-versa. As you are choosing priorities among the many possible use cases for AI, you might find that use cases like these deserve a place at the top of the list. They may get you quick wins while also generating value from your data asset.
Reducing Extremely Tedious Labor (ETL!)
One of the most resource-intensive tasks in any data project, often consuming as much as 60-70% of the effort, is preparing and moving data to be used for analytics—so-called extract, transform, and load (ETL) processes. This heavy burden is why AWS is working toward a zero-ETL future.
Fortunately, you can use generative AI to automatically analyze your source and target data structures, and then help in mapping one into the other. AWS’s generative AI coding assistant, Amazon Q Developer, can build data integration pipelines using natural language. This not only reduces the time and effort required but also helps maintain consistency across different ETL processes, making ongoing support and maintenance easier. Enterprises often find that they have both structured (e.g., customer profiles and sales orders) and unstructured (e.g., social media or customer feedback) data, and that it is held in a variety of data sources, formats, schemas, and data types. The Amazon Q data integration in AWS Glue can generate ETL jobs for over 20 data common data sources, including PostgreSQL, MySQL, Oracle, Amazon Redshift, Snowflake, Google BigQuery, DynamoDB, MongoDB, and OpenSearch.
With generative AI for ETL and data pipelines, data engineers, analysts, and scientists can spend more time solving business problems and deriving insights from your data and less time laying out the plumbing. It is a generative AI use case that most enterprises can start today.
Generative BI: Better Insights, Faster
We often speak of democratizing data across an organization—taking it out of the hands of only the specialists and making it available to everyone. Data analysts and data scientists often find themselves burdened with large, complex projects, limiting their ability to deliver daily, actionable insights to everyone. A barrier to democratization, however, is that not everyone has the skills to work rigorously and creatively with data.
With generative AI, you can interact with your data using conversational queries and natural language. You don’t have to wait for someone to build reports and dashboards to find information, reducing time to value. A retail executive can ask, “What were our top-performing product categories last quarter, and what factors contributed to their success?” Regional supply chain specialists at BMW Group, a global manufacturer of premium automobiles and motorcycles, have been using the generative AI assistant Amazon Q in QuickSight to quickly respond to supply chain visibility requests from senior stakeholders like board members.
Data has the power to influence change—but that requires compelling storytelling. Generative AI can make data easy to work with and enjoyable to use by creating visually appealing documents and presentations that bring the data to life.
A side benefit is that it can help people across the organization become more familiar with the data and its interpretation, making the data all the more valuable and useful for more complex AI applications.
Synthetic Data: Get the Data You Want
As they become more mature with analytics and AI, many enterprises find they don’t have all the data they need for the new use cases they imagine. Acquiring third-party data can be prohibitively expensive. In regulated industries like healthcare and financial services, where data privacy and security are paramount, using actual customer data may not be possible. Data required to test edge cases in business processes is often limited.
You can use AI-generated high-fidelity synthetic data for testing, training, and innovation. It mimics the statistical properties and patterns of real datasets while preserving privacy and eliminating sensitive information. You can also use it to augment data for AI model training where data is scarce or sensitive. Executives can use synthetic data for scenario planning to model various business situations and test strategies to mitigate and reduce risk. Merck, a global pharmaceutical company, uses synthetic data and AWS services to reduce false reject rates in their drug inspection process. They have reduced their false reject rate by 50 percent by developing synthetic defect image data with tools like generative adversarial networks (deep learning models that pit two neural networks against each other to generate new synthetic data) and variational autoencoders (generative neural networks that compress data into a compact representation and then reconstruct it, learning to generate new data in the process).
AI-generated synthetic data can unleash innovation and help you create delightful customer experiences. Amazon One is a fast, convenient service that allows the customer to make a payment, present a loyalty card, verify their age, or enter a venue using only their palm. Amazon needed a large dataset of palm images to train the system, including variations in lighting, hand poses, and conditions like the presence of a bandage. Using AI-generated synthetic data, the team even trained the system to detect highly detailed silicone hand replicas. Customers have already used Amazon One more than three million times with 99.9999 percent accuracy.
AI and Data as Symbiotic
These three examples show how you can use generative AI to unlock the potential of your data, extracting value more quickly and demonstrating tangible wins with generative AI. From automating tedious data integration tasks to empowering business users with conversational analytics, generative AI can help your teams work smarter, not harder. And by generating synthetic data for testing and innovation, you can fuel new ideas and capabilities that were previously out of reach. The surprising key is not just to view your data as the fuel for generative AI, but also generative AI as a powerful new tool you can apply to your data. – Ishit
As published on Analytics India Magazine – https://analyticsindiamag.com/ai-highlights/fuel-your-data-with-generative-ai/