AWS Big Data Blog

Category: AWS Big Data

Unlocking the value of data as your differentiator

Today on the AWS re:Invent keynote stage, Swami Sivasubramanian, VP of Data and AI, AWS, spoke about the beneficial relationship among data, generative AI, and humans—all working together to unleash new possibilities in efficiency and creativity. There has never been a more exciting time in modern technology. Innovation is accelerating everywhere, and the future is […]

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

This post is written in collaboration with Philipp Karg and Alex Gutfreund  from BMW Group. Bayerische Motoren Werke AG (BMW) is a motor vehicle manufacturer headquartered in Germany with 149,475 employees worldwide and the profit before tax in the financial year 2022 was € 23.5 billion on revenues amounting to € 142.6 billion. BMW Group is one of the […]

Clean up your Excel and CSV files without writing code using AWS Glue DataBrew

Managing data within an organization is complex. Handling data from outside the organization adds even more complexity. As the organization receives data from multiple external vendors, it often arrives in different formats, typically Excel or CSV files, with each vendor using their own unique data layout and structure. In this blog post, we’ll explore a […]

Connect your data for faster decisions with AWS

The most impactful data-driven insights come from connecting the dots between all your data sources—across departments, services, on-premises tools, and third-party applications. But typically, connecting data requires complex extract, transform, and load (ETL) pipelines, taking hours or days. That’s too slow for decision-making speed. ETL needs to be easier and sometimes eliminated. AWS is investing […]

Unlock scalable analytics with AWS Glue and Google BigQuery

Data integration is the foundation of robust data analytics. It encompasses the discovery, preparation, and composition of data from diverse sources. In the modern data landscape, accessing, integrating, and transforming data from diverse sources is a vital process for data-driven decision-making. AWS Glue, a serverless data integration and extract, transform, and load (ETL) service, has […]

Create, train, and deploy Amazon Redshift ML model integrating features from Amazon SageMaker Feature Store

Amazon Redshift is a fast, petabyte-scale, cloud data warehouse that tens of thousands of customers rely on to power their analytics workloads. Data analysts and database developers want to use this data to train machine learning (ML) models, which can then be used to generate insights on new data for use cases such as forecasting […]

Unstructured Data Management - AWS Native Architecture

Unstructured data management and governance using AWS AI/ML and analytics services

In this post, we discuss how AWS can help you successfully address the challenges of extracting insights from unstructured data. We discuss various design patterns and architectures for extracting and cataloging valuable insights from unstructured data using AWS. Additionally, we show how to use AWS AI/ML services for analyzing unstructured data.

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

Amazon EMR Serverless provides a serverless runtime environment that simplifies the operation of analytics applications that use the latest open source frameworks, such as Apache Spark and Apache Hive. With EMR Serverless, you don’t have to configure, optimize, secure, or operate clusters to run applications with these frameworks. You can run analytics workloads at any scale with automatic […]

Query big data with resilience using Trino in Amazon EMR with Amazon EC2 Spot Instances for less cost

New enhancements in Trino with Amazon EMR provide improved resiliency for running ETL and batch workloads on Spot Instances with reduced costs. This post showcases the resilience of Amazon EMR with Trino using fault-tolerant configuration to run long-running queries on Spot Instances to save costs. We simulate Spot interruptions on Trino worker nodes by using AWS Fault Injection Simulator (AWS FIS).