AWS Big Data Blog
Securing Apache Kafka is easy and familiar with IAM Access Control for Amazon MSK
This is a guest blog post by AWS Data Hero Stephane Maarek. AWS launched IAM Access Control for Amazon MSK, which is a security option offered at no additional cost that simplifies cluster authentication and Apache Kafka API authorization using AWS Identity and Access Management (IAM) roles or user policies to control access. This eliminates […]
How JPMorgan Chase built a data mesh architecture to drive significant value to enhance their enterprise data platform
April 2024: This post was reviewed for accuracy. This is a joint blog post co-authored with Anu Jain, Graham Person, and Paul Conroy from JP Morgan Chase. Most modern organizations recognize that their data benefits their entire enterprise. Data has value to the individual business process that produces it, but data’s additional potential can be […]
Use HyperLogLog for trend analysis with Amazon Redshift
Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL. Amazon Redshift offers up to three times better price performance than any other cloud data warehouse. Tens of thousands of customers use Amazon Redshift to process exabytes of […]
Monitor data quality in your data lake using PyDeequ and AWS Glue
August 2024: This post was reviewed and updated with examples against a new dataset. Additionally, changed the architecture to use AWS Glue Studio Notebooks and added information on the appropriate Deequ/PyDeequ versions. In our previous post, we introduced PyDeequ, an open-source Python wrapper over Deequ, which enables you to write unit tests on your data […]
Use Grok patterns in AWS Glue to process streaming data into Amazon Elasticsearch Service
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Recently, we launched AWS Glue custom connectors for Amazon OpenSearch Service, which provides the capability to ingest data into Amazon OpenSearch Service with just a few clicks. You can now use Amazon OpenSearch Service as a data store for your […]
How OrthoFi delivers better insights for customers with Amazon Redshift and AWS Glue
This is a guest post by Christa Pierson and Jon Fearer at OrthoFi. OrthoFi is an orthodontic industry leader in revenue cycle management (RCM), and has partnered with more than 550 orthodontic practices across the country, delivering an end-to-end platform that enables orthodontists to bring on more patients and run their businesses more effectively. To […]
Analyzing petabytes of trade and quote data with Amazon FinSpace
We recently announced Amazon FinSpace, a fully-managed data management and analytics service that makes it easy to store, catalog, and prepare financial industry data at scale, reducing the time it takes for financial services industry (FSI) customers to find and access all types of financial data for analysis from months to minutes. Financial services organizations […]
How Digital Infuzion solves the challenge of large-scale scientific data collaboration with Amazon Quicksight
This is a guest post by Digital Infuzion. In their own words, “Digital Infuzion (DIFZ), a leader in information technology, helps solve complex challenges related to genomics, health, and biomedical data, while collaborating with partners including the J. Craig Venter Institute, Gryphon Scientific, ICF International, and others engaged in scientific research. Together, we create novel […]
Orchestrate AWS Glue DataBrew jobs using Amazon Managed Workflows for Apache Airflow
As the industry grows with more data volume, big data analytics is becoming a common requirement in data analytics and machine learning (ML) use cases. Analysts are building complex data transformation pipelines that include multiple steps for data preparation and cleansing. However, analysts may want a simpler orchestration mechanism with a graphical user interface that […]
Enrich your data stream asynchronously using Amazon Kinesis Data Analytics for Apache Flink
August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Streaming data into or out of a data system must be fast. One of the most expensive pieces of any streaming system is the I/O of the […]