AWS Big Data Blog

Tag: Amazon Athena

Building end-to-end data lineage for one-time and complex queries using Amazon Athena, Amazon Redshift, Amazon Neptune and dbt

In this post, we use dbt for data modeling on both Amazon Athena and Amazon Redshift. dbt on Athena supports real-time queries, while dbt on Amazon Redshift handles complex queries, unifying the development language and significantly reducing the technical learning curve. Using a single dbt modeling language not only simplifies the development process but also automatically generates consistent data lineage information. This approach offers robust adaptability, easily accommodating changes in data structures.

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Adoption of data lakes and the data mesh framework emerges as a powerful approach. By decentralizing data ownership and distribution, enterprises can break down silos and enable seamless data sharing. In this post, we discuss how to choose the right tool for building an enterprise data platform and enabling data sharing, collaboration and access within your organization and with third-party providers. We address three business use cases using AWS Glue, AWS Data Exchange, AWS Clean Rooms, and Amazon DataZone through three different use cases.

High level architecture of the Estimations system using Athena

How AppsFlyer modernized their interactive workload by moving to Amazon Athena and saved 80% of costs

AppsFlyer develops a leading measurement solution focused on privacy, which enables marketers to gauge the effectiveness of their marketing activities and integrates them with the broader marketing world, managing a vast volume of 100 billion events every day. This post explores how AppsFlyer modernized their Audiences Segmentation product by using Amazon Athena.

Operational Data Processing Framework for Modern Data Architectures

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS has invested in native service integration with Apache Hudi and published technical contents to enable you to use Apache Hudi with AWS Glue (for example, refer to Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started). In AWS ProServe-led customer engagements, the use cases we work on usually come with technical complexity and scalability requirements. In this post, we discuss a common use case in relation to operational data processing and the solution we built using Apache Hudi and AWS Glue.

Centralize near-real-time governance through alerts on Amazon Redshift data warehouses for sensitive queries

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud that delivers powerful and secure insights on all your data with the best price-performance. With Amazon Redshift, you can analyze your data to derive holistic insights about your business and your customers. In many organizations, one or multiple Amazon Redshift data warehouses […]

Architecture diagram for the Athena WebSocket API. The user connects to the API through API Gateway. API Gateway uses Lambda and DynamoDB to store session data. SQL queries are routed to Amazon Athena and a Step Function polls for query status and returns the results back to the user.

Access Amazon Athena in your applications using the WebSocket API

In this post, we present a solution that can integrate with your front-end application to query data from Amazon S3 using an Athena synchronous API invocation. With this solution, you can add a layer of abstraction to your application on direct Athena API calls and promote the access using the WebSocket API developed with Amazon API Gateway. The query results are returned back to the application as Amazon S3 presigned URLs.

Optimize Federated Query Performance using EXPLAIN and EXPLAIN ANALYZE in Amazon Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon Simple Storage Service (Amazon S3) using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. In 2019, Athena added support for federated queries to run SQL […]

athena-quicksight-cross-account-architecture

Use Amazon Athena and Amazon QuickSight in a cross-account environment

This blog post was last reviewed and updated May, 2022 to include AWS Lake Formation resource sharing model. Many AWS customers use a multi-account strategy to host applications for different departments within the same company. However, you might deploy services like Amazon QuickSight using a single-account approach, which raises challenges when you need to use […]

How MEDHOST’s cardiac risk prediction successfully leveraged AWS analytic services

February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. MEDHOST has been providing products and services to healthcare facilities of all types and sizes for over 35 years. Today, more than 1,000 healthcare facilities are partnering with MEDHOST and enhancing their […]