AWS Big Data Blog

Introducing Cold Storage for Amazon Elasticsearch Service

Log analytics is the most popular use case for Elasticsearch, and with the modern-day advent of the architectural tenet to log everything all the time, it can be a challenge to store and analyze this exponential data growth effectively with a minimal price-to-performance tag. With a proliferating number of applications, log volume to be analyzed […]

Read More

How Imperva uses Amazon Athena for machine learning botnets detection

This is a guest post by Ori Nakar, Principal Engineer at Imperva. In their own words, “Imperva is a large cyber security company and an AWS Partner Network (APN) Advanced Technology Partner, who protects web applications and data assets. Imperva protects over 6,200 enterprises worldwide and many of them use Imperva Web Application Firewall (WAF) […]

Read More

Manage and process your big data workflows with Amazon MWAA and Amazon EMR on Amazon EKS

Many customers are gathering large amount of data, generated from different sources such as IoT devices, clickstream events from websites, and more. To efficiently extract insights from the data, you have to perform various transformations and apply different business logic on your data. These processes require complex workflow management to schedule jobs and manage dependencies […]

Read More

Securing Apache Kafka is easy and familiar with IAM Access Control for Amazon MSK

This is a guest blog post by AWS Data Hero Stephane Maarek.  AWS launched IAM Access Control for Amazon MSK, which is a security option offered at no additional cost that simplifies cluster authentication and Apache Kafka API authorization using AWS Identity and Access Management (IAM) roles or user policies to control access. This eliminates […]

Read More

How JPMorgan Chase built a data mesh architecture to drive significant value to enhance their enterprise data platform

This is a joint blog post co-authored with Anu Jain, Graham Person, and Paul Conroy from JP Morgan Chase.  Most modern organizations recognize that their data benefits their entire enterprise. Data has value to the individual business process that produces it, but data’s additional potential can be realized when it’s shared and combined with other […]

Read More

Use HyperLogLog for trend analysis with Amazon Redshift

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL. Amazon Redshift offers up to three times better price performance than any other cloud data warehouse. Tens of thousands of customers use Amazon Redshift to process exabytes of […]

Read More

Effective data lakes using AWS Lake Formation, Part 3: Using ACID transactions on governed tables

Data lakes on Amazon Simple Storage Service (Amazon S3) have become the default repository for all enterprise data and serve as common choice for a large number of users querying from a variety of analytics and ML tools. Often times you want to ingest data continuously into the data lake from multiple sources and query against the […]

Read More

Use Grok patterns in AWS Glue to process streaming data into Amazon Elasticsearch Service

Recently, we launched AWS Glue custom connectors for Amazon Elasticsearch Service (Amazon ES), which provides the capability to ingest data into Amazon ES with just a few clicks. You can now use Amazon ES as a data store for your extract, transform, and load (ETL) jobs using AWS Glue and AWS Glue Studio. This integration […]

Read More
You can plot the output into a chart using matplotlib.

Analyzing petabytes of trade and quote data with Amazon FinSpace

We recently announced Amazon FinSpace, a fully-managed data management and analytics service that makes it easy to store, catalog, and prepare financial industry data at scale, reducing the time it takes for financial services industry (FSI) customers to find and access all types of financial data for analysis from months to minutes. Financial services organizations […]

Read More

How Digital Infuzion solves the challenge of large-scale scientific data collaboration with Amazon Quicksight

This is a guest post by Digital Infuzion. In their own words, “Digital Infuzion (DIFZ), a leader in information technology, helps solve complex challenges related to genomics, health, and biomedical data, while collaborating with partners including the J. Craig Venter Institute, Gryphon Scientific, ICF International, and others engaged in scientific research. Together, we create novel […]

Read More