AWS Big Data Blog
Building AWS Data Lake visualizations with Amazon Athena and Tableau
July 2023: This post was reviewed for accuracy. Amazon Athena is an interactive query service that makes it easy to analyze data in a data lake using standard SQL. One of the key elements of Athena is that you only pay for the queries you run. This is an attractive feature because there is no […]
How EMX reduced data pipeline costs by 85% with Amazon Athena
This is a guest blog post by Gary Bouton and Louis Ashner from EMX. In their own words, “ENGINE Media Exchange (EMX) is a leading marketing technology company, leveraging a patented, end-to-end tech stack purpose-built to meet the demands of today’s digital marketplace. The company creates both open- and closed-loop solutions designed to unify advertisers, […]
Detecting anomalous values by invoking the Amazon Athena machine learning inference function
Amazon Athena has released a new feature that allows you to easily invoke machine learning (ML) models for inference directly from your SQL queries. Inference is the stage in which a trained model is used to infer and predict the testing samples and comprises a similar forward pass as training to predict the values. Unlike […]
Orchestrating analytics jobs on Amazon EMR Notebooks using Amazon MWAA
May 2024: This post was reviewed and updated with a new dataset. In a previous post, we introduced the Amazon EMR notebook APIs, which allow you to programmatically run a notebook on Amazon EMR Studio (preview) without accessing the AWS web console. With the APIs, you can schedule running EMR notebooks with cron scripts, chain multiple notebooks, […]
How Edmunds GovTech unifies data and analytics data for municipalities with Amazon QuickSight
This is a guest post from an Amazon QuickSight customer, Edmunds GovTech Over the past 30 years, Edmunds GovTech has grown to provide enterprise resource planning (ERP) solutions to thousands of East Coast municipalities. We also serve cities and towns in 25 other states. In this blog, I’ll talk about how we used Amazon QuickSight […]
Data monetization and customer experience optimization using telco data assets: Part 2
Part 1 of this series explains the importance of building and implementing a customer experience (CX) management and data monetization strategy for telecom service providers (TSPs), and the major challenges driving these initiatives. It also includes an AWS CloudFormation template to set up a demonstration of the solution using AWS services. It covers transforming and enriching […]
Building a cost efficient, petabyte-scale lake house with Amazon S3 lifecycle rules and Amazon Redshift Spectrum: Part 2
In part 1 of this series, we demonstrated building an end-to-end data lifecycle management system integrated with a data lake house implemented on Amazon Simple Storage Service (Amazon S3) with Amazon Redshift and Amazon Redshift Spectrum. In this post, we address the ongoing operation of the solution we built. Data ageing process after a month […]
Centrally tracking dashboard lineage, permissions, and more with Amazon QuickSight administrative dashboards
This post is co-written with Shawn Koupal, an Enterprise Analytics IT Architect at Best Western International, Inc. A common ask from Amazon QuickSight administrators is to understand the lineage of a given dashboard (what analysis is it built from, what datasets are used in the analysis, and what data sources do those datasets use). QuickSight […]
Building AWS Glue Spark ETL jobs by bringing your own JDBC drivers for Amazon RDS
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load your data for analytics. AWS Glue has native connectors to connect to supported data sources either on AWS or elsewhere using JDBC drivers. Additionally, AWS Glue now enables you to bring your own JDBC drivers […]
Developing, testing, and deploying custom connectors for your data stores with AWS Glue
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue already integrates with various popular data stores such as the Amazon Redshift, RDS, MongoDB, and Amazon S3. Organizations continue to evolve and use a variety of data stores that best fit […]