AWS Big Data Blog

Work with semistructured data using Amazon Redshift SUPER

With the new SUPER data type and the PartiQL language, Amazon Redshift expands data warehouse capabilities to natively ingest, store, transform, and analyze semi-structured data. Semi-structured data (such as weblogs and sensor data) fall under the category of data that doesn’t conform to a rigid schema expected in relational databases. It often contain complex values […]

Read More

Increase Amazon Elasticsearch Service performance by upgrading to Graviton2

September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Amazon OpenSearch Service supports multiple instance types based on your use case. In 2021, AWS announced general purpose (M6g), compute optimized (C6g), and memory optimized (R6g, R6gd) instance types for Amazon OpenSearch Service version 7.9 or later powered by AWS […]

Read More

Design patterns for an enterprise data lake using AWS Lake Formation cross-account access

In this post, we briefly walk through the most common design patterns adapted by enterprises to build lake house solutions to support their business agility in a multi-tenant model using the AWS Lake Formation cross-account feature to enable a multi-account strategy for line of business (LOB) accounts to produce and consume data from your data […]

Read More

Streaming Amazon DynamoDB data into a centralized data lake

For organizations moving towards a serverless microservice approach, Amazon DynamoDB has become a preferred backend database due to its fully managed, multi-Region, multi-active durability with built-in security controls, backup and restore, and in-memory caching for internet-scale application. , which you can then use to derive near-real-time business insights. The data lake provides capabilities to business teams to plug in […]

Read More

Increase Apache Kafka’s resiliency with a multi-Region deployment and MirrorMaker 2

Customers create business continuity plans and disaster recovery (DR) strategies to maximize resiliency for their applications, because downtime or data loss can result in losing revenue or halting operations. Ultimately, DR planning is all about enabling the business to continue running despite a Regional outage. This post explains how to make Apache Kafka resilient to […]

Read More

Top 10 Flink SQL queries to try in Amazon Kinesis Data Analytics Studio

Amazon Kinesis Data Analytics Studio makes it easy to analyze streaming data in real time and build stream processing applications using standard SQL, Python, and Scala. With a few clicks on the AWS Management Console, you can launch a serverless notebook to query data streams and get results in seconds. Kinesis Data Analytics reduces the […]

Read More

Preprocess logs for anomaly detection in Amazon OpenSearch

September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Amazon OpenSearch Service supports real-time anomaly detection, which uses machine learning (ML) to proactively detect anomalies in real-time streaming data. When used to analyze application logs, it can detect anomalies such as unusually high error rates or sudden changes in […]

Read More

Automate the archival and deletion of sensitive data using Amazon Macie

This post was updated May 2022 to update the AWS CloudFormation template. Customers are looking for ways to securely and cost-efficiently manage large volumes of sensitive data archival and deletion in their data lake by following regulations and data protection and privacy laws, such as GDPR, POPIA, and LGPD. This post describes a way to […]

Read More

DOCOMO empowers business units with self-service knowledge access thanks to agile AWS QuickSight business intelligence

NTT DOCOMO is the largest telecom company in Japan. It provides innovative, convenient, and secure mobile services that enable customers to realize smarter lives. More than 73 million customers in Japan connect through its advanced wireless networks, including a nationwide LTE network and one of the world’s most progressive LTE Advanced networks. In addition to […]

Read More

Amazon Redshift identity federation with multi-factor authentication

Password-based access control alone is not considered secure enough, and many organizations are adopting multi-factor authentication (MFA) and single sign-on (SSO) as a de facto standard to prevent unauthorized access to systems and data. SSO frees up time and resources for both administrators and end users from the painful process of password-based credential management. MFA […]

Read More