AWS Big Data Blog

Category: AWS Lambda

Auto scaling Amazon Kinesis Data Streams using Amazon CloudWatch and AWS Lambda

This post is co-written with Noah Mundahl, Director of Public Cloud Engineering at United Health Group. Update (12/1/2021): Amazon Kinesis Data Streams On-Demand mode is now the recommended way to natively auto scale your Amazon Kinesis Data Streams. In this post, we cover a solution to add auto scaling to Amazon Kinesis Data Streams. Whether […]

Query your Oracle database using Athena Federated Query and join with data in your Amazon S3 data lake

This post was last reviewed and updated July, 2022 with updates in Athena federation connector. If you use data lakes in Amazon Simple Storage Service (Amazon S3) and use Oracle as your transactional data store, you may need to join the data in your data lake with Oracle on Amazon Relational Database Service (Amazon RDS), Oracle running on Amazon […]

Create a secure data lake by masking, encrypting data, and enabling fine-grained access with AWS Lake Formation

You can build data lakes with millions of objects on Amazon Simple Storage Service (Amazon S3) and use AWS native analytics and machine learning (ML) services to process, analyze, and extract business insights. You can use a combination of our purpose-built databases and analytics services like Amazon EMR, Amazon OpenSearch Service, and Amazon Redshift as […]

Automate Amazon ES synonym file updates

September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Search engines provide the means to retrieve relevant content from a collection of content. However, this can be challenging if certain exact words aren’t entered. You need to find the right item from a catalog of products, or the correct […]

Build a serverless tracking pixel solution in AWS

Let’s describe the typical use case where a tracking pixel solution, also known as a web beacon, might help you: Analyzing web traffic is critical to understanding user behavior in order to improve their experience. Let’s think about a company—Example Company Hotels—that embeds a piece of HTML into a high-traffic, third-party website (example.HighTrafficWebsite.com) to have […]

Automate dynamic mapping and renaming of column names in data files using AWS Glue: Part 1

A common challenge ETL and big data developers face is working with data files that don’t have proper name header records. They’re tasked with renaming the columns of the data files appropriately so that downstream application and mappings for data load can work seamlessly. One example use case is while working with ORC files and […]

Automate dynamic mapping and renaming of column names in data files using AWS Glue: Part 2

In Part 1 of this two-part post, we looked at how we can create an AWS Glue ETL job that is agnostic enough to rename columns of a data file by mapping to column names of another file. The solution focused on using a single file that was populated in the AWS Glue Data Catalog […]

Data monetization and customer experience optimization using telco data assets: Part 2

Part 1 of this series explains the importance of building and implementing a customer experience (CX) management and data monetization strategy for telecom service providers (TSPs), and the major challenges driving these initiatives. It also includes an AWS CloudFormation template to set up a demonstration of the solution using AWS services. It covers transforming and enriching […]

We’ll walk through a solution that takes sets up a recurring Profile job to determine data quality metrics, and using your defined business rules.

Setting up automated data quality workflows and alerts using AWS Glue DataBrew and AWS Lambda

Proper data management is critical to successful, data-driven decision-making. An increasingly large number of customers are adopting data lakes to realize deeper insights from big data. As part of this, you need clean and trusted data in order to gain insights that lead to improvements in your business. As the saying goes, garbage in is […]

Best practices for consuming Amazon Kinesis Data Streams using AWS Lambda

Many organizations are processing and analyzing clickstream data in real time from customer-facing applications to look for new business opportunities and identify security incidents in real time. A common practice is to consolidate and enrich logs from applications and servers in real time to proactively identify and resolve failure scenarios and significantly reduce application downtime. […]