AWS Big Data Blog

Category: Analytics

Analyze Amazon EMR on Amazon EC2 cluster usage with Amazon Athena and Amazon QuickSight

In this post, we guide you through deploying a comprehensive solution in your Amazon Web Services (AWS) environment to analyze Amazon EMR on EC2 cluster usage. By using this solution, you will gain a deep understanding of resource consumption and associated costs of individual applications running on your EMR cluster.

Achieve the best price-performance in Amazon Redshift with elastic histograms for selectivity estimation

Amazon Redshift now offers enhanced query performance with optimizations such as Enhanced Histograms for Selectivity Estimation in the absence of fresh statistics by relying on metadata statistics gathered during ingestion. In this post, we cover new performance optimizations in Redshift data warehouse query processing and how elastic histogram statistics help enhance selectivity estimation and the overall quality of query plans for Amazon Redshift data warehouse queries in the absence of fresh table statistics.

How to implement access control and auditing on Amazon Redshift using Immuta

This post is co-written with Matt Vogt from Immuta.  Organizations are looking for products that let them spend less time managing data and more time on core business functions. Data security is one of the key functions in managing a data warehouse. With Immuta integration with Amazon Redshift, user and data security operations are managed […]

Proposed Solution

Manage Amazon OpenSearch Service Visualizations, Alerts, and More with GitHub and Jenkins

OpenSearch Service stores different types of stored objects, such as dashboards, visualizations, alerts, security roles, index templates, and more, within the domain. As your user base and number of Amazon OpenSearch Service domains grow, tracking activities and changes to those saved objects becomes increasingly difficult. In this post, we present a solution to deploy stored objects using GitHub and Jenkins while preventing users making direct changes into OpenSearch Service domain

Simplify your query performance diagnostics in Amazon Redshift with Query profiler

Amazon Redshift has introduced a new feature called the Query profiler. The Query profiler is a graphical tool that helps users analyze the components and performance of a query. This feature is part of the Amazon Redshift console and provides a visual and graphical representation of the query’s run order, execution plan, and various statistics. The Query profiler makes it easier for users to understand and troubleshoot their queries. In this post, we cover two common use cases for troubleshooting query performance. We show you step-by-step how to analyze and troubleshoot long-running queries using the Query profiler.

How Getir unleashed data democratization using a data mesh architecture with Amazon Redshift

In this post, we explain how ultrafast delivery pioneer, Getir, unleashed the power of data democratization on a large scale through their data mesh architecture using Amazon Redshift. We start by introducing Getir and their vision—to seamlessly, securely, and efficiently share business data across different teams within the organization for BI, extract, transform, and load (ETL), and other use cases. We’ll then explore how Amazon Redshift data sharing powered the data mesh architecture that allowed Getir to achieve this transformative vision.

Apache HBase online migration to Amazon EMR

Apache HBase is an open source, non-relational distributed database developed as part of the Apache Software Foundation’s Hadoop project. HBase can run on Hadoop Distributed File System (HDFS) or Amazon Simple Storage Service (Amazon S3), and can host very large tables with billions of rows and millions of columns. The followings are some typical use […]

Infor’s Amazon OpenSearch Service Modernization: 94% faster searches and 50% lower costs

In this post, we’ll explore Infor’s journey to modernize its search capabilities, the key benefits they achieved, and the technologies that powered this transformation. We’ll also discuss how Infor’s customers are now able to more effectively search through business messages, documents, and other critical data within the ION OneView platform.

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Adoption of data lakes and the data mesh framework emerges as a powerful approach. By decentralizing data ownership and distribution, enterprises can break down silos and enable seamless data sharing. In this post, we discuss how to choose the right tool for building an enterprise data platform and enabling data sharing, collaboration and access within your organization and with third-party providers. We address three business use cases using AWS Glue, AWS Data Exchange, AWS Clean Rooms, and Amazon DataZone through three different use cases.