AWS Big Data Blog

Category: Analytics

The following diagram shows the architecture EMX uses.

How EMX reduced data pipeline costs by 85% with Amazon Athena

This is a guest blog post by Gary Bouton and Louis Ashner from EMX. In their own words, “ENGINE Media Exchange (EMX) is a leading marketing technology company, leveraging a patented, end-to-end tech stack purpose-built to meet the demands of today’s digital marketplace. The company creates both open- and closed-loop solutions designed to unify advertisers, […]

Detecting anomalous values by invoking the Amazon Athena machine learning inference function

Amazon Athena has released a new feature that allows you to easily invoke machine learning (ML) models for inference directly from your SQL queries. Inference is the stage in which a trained model is used to infer and predict the testing samples and comprises a similar forward pass as training to predict the values. Unlike […]

The following diagram illustrates the workflow.

Orchestrating analytics jobs on Amazon EMR Notebooks using Amazon MWAA

May 2024: This post was reviewed and updated with a new dataset. In a previous post, we introduced the Amazon EMR notebook APIs, which allow you to programmatically run a notebook on Amazon EMR Studio (preview) without accessing the AWS web console. With the APIs, you can schedule running EMR notebooks with cron scripts, chain multiple notebooks, […]

The following diagram illustrates this architecture.

How Edmunds GovTech unifies data and analytics data for municipalities with Amazon QuickSight

This is a guest post from an Amazon QuickSight customer, Edmunds GovTech Over the past 30 years, Edmunds GovTech has grown to provide enterprise resource planning (ERP) solutions to thousands of East Coast municipalities. We also serve cities and towns in 25 other states. In this blog, I’ll talk about how we used Amazon QuickSight […]

Data monetization and customer experience optimization using telco data assets: Part 2

Part 1 of this series explains the importance of building and implementing a customer experience (CX) management and data monetization strategy for telecom service providers (TSPs), and the major challenges driving these initiatives. It also includes an AWS CloudFormation template to set up a demonstration of the solution using AWS services. It covers transforming and enriching […]

Building a cost efficient, petabyte-scale lake house with Amazon S3 lifecycle rules and Amazon Redshift Spectrum: Part 2

In part 1 of this series, we demonstrated building an end-to-end data lifecycle management system integrated with a data lake house implemented on Amazon Simple Storage Service (Amazon S3) with Amazon Redshift and Amazon Redshift Spectrum. In this post, we address the ongoing operation of the solution we built. Data ageing process after a month […]

Centrally tracking dashboard lineage, permissions, and more with Amazon QuickSight administrative dashboards

This post is co-written with Shawn Koupal, an Enterprise Analytics IT Architect at Best Western International, Inc. A common ask from Amazon QuickSight administrators is to understand the lineage of a given dashboard (what analysis is it built from, what datasets are used in the analysis, and what data sources do those datasets use). QuickSight […]

In the third scenario, we set up a connection where we connect to Oracle 18 and MySQL 8 using external drivers from AWS Glue ETL, extract the data, transform it, and load the transformed data to Oracle 18.

Building AWS Glue Spark ETL jobs by bringing your own JDBC drivers for Amazon RDS

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load your data for analytics. AWS Glue has native connectors to connect to supported data sources either on AWS or elsewhere using JDBC drivers. Additionally, AWS Glue now enables you to bring your own JDBC drivers […]

These data sources cover the following categories:

Developing, testing, and deploying custom connectors for your data stores with AWS Glue

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue already integrates with various popular data stores such as the Amazon Redshift, RDS, MongoDB, and Amazon S3. Organizations continue to evolve and use a variety of data stores that best fit […]

7. Choose Continue to Launch.

Migrating data from Google BigQuery to Amazon S3 using AWS Glue custom connectors

July 2022: This post was reviewed and updated to include a mew data point on the effective runtime with the latest version, explaining Glue 3,0 and autoscaling. In today’s connected world, it’s common to have data sitting in various data sources in a variety of formats. Even though data is a critical component of decision […]