AWS Big Data Blog

Category: Database

Build seamless data streaming pipelines with Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose for Amazon DynamoDB tables

The global wearables market grew 35.1% year over year during the third quarter of 2020, with total shipments reaching 125 million units according to new data from the International Data Corporation (IDC) Worldwide Quarterly Wearable Device Tracker. The surge was driven by seasonality, new product launches, and the health concerns during the global pandemic. Given […]

Read More

Build a data lake using Amazon Kinesis Data Streams for Amazon DynamoDB and Apache Hudi

Amazon DynamoDB helps you capture high-velocity data such as clickstream data to form customized user profiles and online order transaction data to develop customer order fulfillment applications, improve customer satisfaction, and get insights into sales revenue to create a promotional offer for the customer. It’s essential to store these data points in a centralized data […]

Read More
The following diagram illustrates the architecture of these multi-tenant storage strategies.

Implementing multi-tenant patterns in Amazon Redshift using data sharing

Software service providers offer subscription-based analytics capabilities in the cloud with Analytics as a Service (AaaS), and increasingly customers are turning to AaaS for business insights. A multi-tenant storage strategy allows the service providers to build a cost-effective architecture to meet increasing demand. Multi-tenancy means a single instance of software and its supporting infrastructure is […]

Read More
In the third scenario, we set up a connection where we connect to Oracle 18 and MySQL 8 using external drivers from AWS Glue ETL, extract the data, transform it, and load the transformed data to Oracle 18.

Building AWS Glue Spark ETL jobs by bringing your own JDBC drivers for Amazon RDS

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load your data for analytics. AWS Glue has native connectors to connect to supported data sources either on AWS or elsewhere using JDBC drivers. Additionally, AWS Glue now enables you to bring your own JDBC drivers […]

Read More
For Configure route tables, select the route table ID of the associated subnet of the database.

Building AWS Glue Spark ETL jobs using Amazon DocumentDB (with MongoDB compatibility) and MongoDB

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load your data for analytics. AWS Glue has native connectors to connect to supported data sources on AWS or elsewhere using JDBC drivers. Additionally, AWS Glue now supports reading and writing to Amazon DocumentDB (with MongoDB […]

Read More
AQUA is available on Amazon Redshift RA3 instances at no additional cost.

The best new features for data analysts in Amazon Redshift in 2020

This is a guest post by Helen Anderson, data analyst and AWS Data Hero Every year, the Amazon Redshift team launches new and exciting features, and 2020 was no exception. New features to improve the data warehouse service and add interoperability with other AWS services were rolling out all year. I am part of a […]

Read More
The following architecture diagram illustrates the wind turbine protection system.

Building a real-time notification system with Amazon Kinesis Data Streams for Amazon DynamoDB and Amazon Kinesis Data Analytics for Apache Flink

Amazon DynamoDB helps you capture high-velocity data such as clickstream data to form customized user profiles and Internet of Things (IoT) data so that you can develop insights on sensor activity across various industries, including smart spaces, connected factories, smart packing, fitness monitoring, and more. It’s important to store these data points in a centralized […]

Read More
The following diagram shows the workflow to connect Apache Airflow to Amazon EMR.

Dream11’s journey to building their Data Highway on AWS

This is a guest post co-authored by Pradip Thoke of Dream11. In their own words, “Dream11, the flagship brand of Dream Sports, is India’s biggest fantasy sports platform, with more than 100 million users. We have infused the latest technologies of analytics, machine learning, social networks, and media technologies to enhance our users’ experience. Dream11 […]

Read More
The following diagram depicts the cloud DW benchmark data model used.

Sharing Amazon Redshift data securely across Amazon Redshift clusters for workload isolation

Amazon Redshift data sharing allows for a secure and easy way to share live data for read purposes across Amazon Redshift clusters. Amazon Redshift is a fast, fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. It allows […]

Read More
The following diagram illustrates the solution architecture.

Accelerating Amazon Redshift federated query to Amazon Aurora MySQL with AWS CloudFormation

Amazon Redshift federated query allows you to combine data from one or more Amazon Relational Database Service (Amazon RDS) for MySQL and Amazon Aurora MySQL databases with data already in Amazon Redshift. You can also combine such data with data in an Amazon Simple Storage Service (Amazon S3) data lake. This post shows you how […]

Read More