AWS Big Data Blog

Category: AWS Big Data

Optimize your Amazon Redshift query performance with automated materialized views

Amazon Redshift is a fast, fully managed cloud data warehouse database that makes it cost-effective to analyze your data using standard SQL and business intelligence tools. Amazon Redshift allows you to analyze structured and semi-structured data and seamlessly query data lakes and operational databases, using AWS designed hardware and automated machine learning (ML)-based tuning to […]

Use the AWS Glue connector to read and write Apache Iceberg tables with ACID transactions and perform time travel

Nowadays, many customers have built their data lakes as the core of their data analytic systems. In a typical use case of data lakes, many concurrent queries run to retrieve consistent snapshots of business insights by aggregating query results. A large volume of data constantly comes from different data sources into the data lakes. There […]

Comparing throughput and put latencies of different broker sizes

Best practices for right-sizing your Apache Kafka clusters to optimize performance and cost

Apache Kafka is well known for its performance and tunability to optimize for various use cases. But sometimes it can be challenging to find the right infrastructure configuration that meets your specific performance requirements while minimizing the infrastructure cost. This post explains how the underlying infrastructure affects Apache Kafka performance. We discuss strategies on how […]

Unify log aggregation and analytics across compute platforms

Our customers want to make sure their users have the best experience running their application on AWS. To make this happen, you need to monitor and fix software problems as quickly as possible. Doing this gets challenging with the growing volume of data needing to be quickly detected, analyzed, and stored. In this post, we […]

Enforce customized data quality rules in AWS Glue DataBrew

GIGO (garbage in, garbage out) is a concept common to computer science and mathematics: the quality of the output is determined by the quality of the input. In modern data architecture, you bring data from different data sources, which creates challenges around volume, velocity, and veracity. You might write unit tests for applications, but it’s […]

architecture diagram

Create a serverless event-driven workflow to ingest and process Microsoft data with AWS Glue and Amazon EventBridge

Microsoft SharePoint is a document management system for storing files, organizing documents, and sharing and editing documents in collaboration with others. Your organization may want to ingest SharePoint data into your data lake, combine the SharePoint data with other data that’s available in the data lake, and use it for reporting and analytics purposes. AWS […]

How MEDHOST’s cardiac risk prediction successfully leveraged AWS analytic services

MEDHOST has been providing products and services to healthcare facilities of all types and sizes for over 35 years. Today, more than 1,000 healthcare facilities are partnering with MEDHOST and enhancing their patient care and operational excellence with its integrated clinical and financial EHR solutions. MEDHOST also offers a comprehensive Emergency Department Information System with […]

Query a Teradata database using Amazon Athena Federated Query and join with data in your Amazon S3 data lake

If you use data lakes in Amazon Simple Storage Service (Amazon S3) and use Teradata as your transactional data store, you may need to join the data in your data lake with Teradata in the cloud, Teradata running on Amazon Elastic Compute Cloud (Amazon EC2), or with an on-premises Teradata database, for example to build […]

Query Snowflake using Athena Federated Query and join with data in your Amazon S3 data lake

This post was last reviewed and updated July, 2022 with updates in Athena federation connector. If you use data lakes in Amazon Simple Storage Service (Amazon S3) and use Snowflake as your data warehouse solution, you may need to join your data in your data lake with Snowflake. For example, you may want to build […]

You can plot the output into a chart using matplotlib.

Analyzing petabytes of trade and quote data with Amazon FinSpace

We recently announced Amazon FinSpace, a fully-managed data management and analytics service that makes it easy to store, catalog, and prepare financial industry data at scale, reducing the time it takes for financial services industry (FSI) customers to find and access all types of financial data for analysis from months to minutes. Financial services organizations […]