AWS Big Data Blog
Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy
Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze your data using standard SQL and your existing business intelligence (BI) tools. Tens of thousands of customers today rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries, making it […]
Improve OpenSearch Service cluster resiliency and performance with dedicated coordinator nodes
Today, we are announcing dedicated coordinator nodes for Amazon OpenSearch Service domains deployed on managed clusters. When you use Amazon OpenSearch Service to create OpenSearch domains, the data nodes serve dual roles of coordinating data-related requests like indexing requests, and search requests, and of doing the work of processing the requests – indexing documents and […]
Control your AWS Glue Studio development interface with AWS Glue job mode API property
The AWS Glue Jobs API is a robust interface that allows data engineers and developers to programmatically manage and run ETL jobs. To improve customer experience with the AWS Glue Jobs API, we added a new property describing the job mode corresponding to script, visual, or notebook. In this post, we explore how the updated AWS Glue Jobs API works in depth and demonstrate the new experience with the updated API.
How BMW streamlined data access using AWS Lake Formation fine-grained access control
This post explores how BMW implemented AWS Lake Formation’s fine-grained access control (FGAC) in the Cloud Data Hub and how this saves them up to 25% on compute and storage costs. By using AWS Lake Formation fine-grained access control capabilities, BMW has transparently implemented finer data access management within the Cloud Data Hub. The integration of Lake Formation has enabled data stewards to scope and grant granular access to specific subsets of data, reducing costly data duplication.
Analyze Amazon EMR on Amazon EC2 cluster usage with Amazon Athena and Amazon QuickSight
In this post, we guide you through deploying a comprehensive solution in your Amazon Web Services (AWS) environment to analyze Amazon EMR on EC2 cluster usage. By using this solution, you will gain a deep understanding of resource consumption and associated costs of individual applications running on your EMR cluster.
Achieve the best price-performance in Amazon Redshift with elastic histograms for selectivity estimation
Amazon Redshift now offers enhanced query performance with optimizations such as Enhanced Histograms for Selectivity Estimation in the absence of fresh statistics by relying on metadata statistics gathered during ingestion. In this post, we cover new performance optimizations in Redshift data warehouse query processing and how elastic histogram statistics help enhance selectivity estimation and the overall quality of query plans for Amazon Redshift data warehouse queries in the absence of fresh table statistics.
How to implement access control and auditing on Amazon Redshift using Immuta
This post is co-written with Matt Vogt from Immuta. Organizations are looking for products that let them spend less time managing data and more time on core business functions. Data security is one of the key functions in managing a data warehouse. With Immuta integration with Amazon Redshift, user and data security operations are managed […]
Manage Amazon OpenSearch Service Visualizations, Alerts, and More with GitHub and Jenkins
OpenSearch Service stores different types of stored objects, such as dashboards, visualizations, alerts, security roles, index templates, and more, within the domain. As your user base and number of Amazon OpenSearch Service domains grow, tracking activities and changes to those saved objects becomes increasingly difficult. In this post, we present a solution to deploy stored objects using GitHub and Jenkins while preventing users making direct changes into OpenSearch Service domain
Simplify your query performance diagnostics in Amazon Redshift with Query profiler
Amazon Redshift has introduced a new feature called the Query profiler. The Query profiler is a graphical tool that helps users analyze the components and performance of a query. This feature is part of the Amazon Redshift console and provides a visual and graphical representation of the query’s run order, execution plan, and various statistics. The Query profiler makes it easier for users to understand and troubleshoot their queries. In this post, we cover two common use cases for troubleshooting query performance. We show you step-by-step how to analyze and troubleshoot long-running queries using the Query profiler.
Introducing simplified interaction with the Airflow REST API in Amazon MWAA
Today, we are excited to announce an enhancement to the Amazon MWAA integration with the Airflow REST API. This improvement streamlines the ability to access and manage your Airflow environments and their integration with external systems, and allows you to interact with your workflows programmatically. The Airflow REST API facilitates a wide range of use cases, from centralizing and automating administrative tasks to building event-driven, data-aware data pipelines. In this post, we discuss the enhancement and present several use cases that the enhancement unlocks for your Amazon MWAA environment.