AWS Big Data Blog

Category: Analytics

Set up cross-account audit logging for your Amazon Redshift cluster

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With Amazon Redshift, you can analyze all your data to derive holistic insights about your business and your customers. One of the best practices of modern application design is to have centralized logging. Troubleshooting application problems is easy when you can correlate […]

Federate access to Amazon Redshift using the JDBC browser plugin for Single Sign-on authentication with Microsoft Azure Active Directory

Since 2020, Amazon Redshift has supported multi-factor authentication (MFA) to any SAML 2.0 compliant identity provider (IdP) in our JDBC and ODBC drivers. You can map the IdP user identity and group memberships in order to control authorization for database objects in Amazon Redshift. This simplifies administration by enabling you to manage user access in […]

Optimize your analytical workloads using the automatic query rewrite feature of Amazon Redshift materialized views

Amazon Redshift materialized views enable you to significantly improve performance of complex queries that are frequently run as part of your extract, load, and transform (ELT), business intelligence (BI), or dashboarding applications. Materialized views precompute and store the result sets of the SQL query in the view definition. Materialized views speed up data access, because […]

How Belcorp decreased cost and improved reliability in its big data processing framework using Amazon EMR managed scaling

This is a guest post by Diego Benavides and Luis Bendezú, Senior Data Architects, Data Architecture Direction at Belcorp. Belcorp is one of the main consumer packaged goods (CPG) companies providing cosmetics products in the region for more than 50 years, allocated to around 13 countries in North, Central, and South America (AMER). Born in Peru […]

Power highly resilient use cases with Amazon Redshift

Amazon Redshift is the most popular and fastest cloud data warehouse, offering seamless integration with your data lake and other data sources, up to three times faster performance than any other cloud data warehouse, automated maintenance, separation of storage and compute, and up to 75% lower cost than any other cloud data warehouse. This post […]

Enrich datasets for descriptive analytics with AWS Glue DataBrew

Data analytics remains a constantly hot topic. More and more businesses are beginning to understand the potential their data has to allow them to serve customers more effectively and give them a competitive advantage. However, for many small to medium businesses, gaining insight from their data can be challenging because they often lack in-house data […]

Automating Index State Management for Amazon OpenSearch Service

When it comes to time-series data, it’s more common to access new data than existing data, such as the last four hours or one day. Often, application teams must maintain multiple indexes for diverse data workloads, which bring new requirements to set up a custom solution to manage the index lifecycles. This becomes tedious as […]

Build a modern data architecture on AWS with Amazon AppFlow, AWS Lake Formation, and Amazon Redshift

This is a guest post written by Dr. Yannick Misteli, lead cloud platform and ML engineering in global product strategy (GPS) at Roche. Recently the Roche Data Insights (RDI) initiative was launched to achieve our vision using new ways of working and collaboration in order to build shared, interoperable data & insights with federated governance. […]

New features from Apache Hudi 0.7.0 and 0.8.0 available on Amazon EMR

Apache Hudi is an open-source transactional data lake framework that greatly simplifies incremental data processing and data pipeline development by providing record-level insert, update, and delete capabilities. This record-level capability is helpful if you’re building your data lakes on Amazon Simple Storage Service (Amazon S3) or Hadoop Distributed File System (HDFS). You can use it […]

Query cross-account AWS Glue Data Catalogs using Amazon Athena

Many AWS customers rely on a multi-account strategy to scale their organization and better manage their data lake across different projects or lines of business. The AWS Glue Data Catalog contains references to data used as sources and targets of your extract, transform, and load (ETL) jobs in AWS Glue. Using a centralized Data Catalog […]