AWS Big Data Blog

Category: Amazon DynamoDB

Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight

If you run an operation that continuously generates a large amount of data, you may want to know what kind of data is being inserted by your application. The ability to analyze data intake quickly can be very valuable for business units, such as operations and marketing. For many operations, it’s important to see what […]

Near Zero Downtime Migration from MySQL to DynamoDB

Many companies consider migrating from relational databases like MySQL to Amazon DynamoDB, a fully managed, fast, highly scalable, and flexible NoSQL database service. For example, DynamoDB can increase or decrease capacity based on traffic, in accordance with business needs. The total cost of servicing can be optimized more easily than for the typical media-based RDBMS. […]

Data Lake Ingestion: Automatically Partition Hive External Tables with AWS

In this post, I introduce a simple data ingestion and preparation framework based on AWS Lambda, Amazon DynamoDB, and Apache Hive on EMR for data from different sources landing in S3. This solution lets Hive pick up new partitions as data is loaded into S3 because Hive by itself cannot detect new partitions as data lands.

Monitor Your Application for Processing DynamoDB Streams

In this post, I suggest ways you can monitor the Amazon Kinesis Client Library (KCL) application you use to process DynamoDB Streams to quickly track and resolve issues or failures so you can avoid losing data. Dashboards, metrics, and application logs all play a part. This post may be most relevant to Java applications running on Amazon EC2 instances.

Process Large DynamoDB Streams Using Multiple Amazon Kinesis Client Library (KCL) Workers

Asmita Barve-Karandikar is an SDE with DynamoDB Introduction Imagine you own a popular mobile health app, with millions of users worldwide, that continuously records new information. It sends over one million updates per second to its master data store and needs the updates to be relayed to various replicas across different regions in real time. […]

Processing Amazon DynamoDB Streams Using the Amazon Kinesis Client Library

Asmita Barve-Karandikar is an SDE with DynamoDB Customers often want to process streams on an Amazon DynamoDB table with a significant number of partitions or with a high throughput. AWS Lambda and the DynamoDB Streams Kinesis Adapter are two ways to consume DynamoDB streams in a scalable way. While Lambda lets you run your application […]

Using Spark SQL for ETL

Ben Snively is a Solutions Architect with AWS With big data, you deal with many different formats and large volumes of data. SQL-style queries have been around for nearly four decades. Many systems support SQL-style syntax on top of the data layers, and the Hadoop/Spark ecosystem is no exception. This allows companies to try new […]

Real-time in-memory OLTP and Analytics with Apache Ignite on AWS

Babu Elumalai is a Solutions Architect with AWS Organizations are generating tremendous amounts of data, and they increasingly need tools and systems that help them use this data to make decisions. The data has both immediate value (for example, trying to understand how a new promotion is performing in real time) and historic value (trying […]

Analyze Your Data on Amazon DynamoDB with Apache Spark

Manjeet Chayel is a Solutions Architect with AWS Every day, tons of customer data is generated, such as website logs, gaming data, advertising data, and streaming videos. Many companies capture this information as it’s generated and process it in real time to understand their customers. Amazon DynamoDB is a fast and flexible NoSQL database service […]