AWS Big Data Blog

Securely share your data across AWS accounts using AWS Lake Formation

Data lakes have become very popular with organizations that want a centralized repository that allows you to store all your structured data and unstructured data at any scale. Because data is stored as is, there is no need to convert it to a predefined schema in advance. When you have new business use cases, you […]

Backtest trading strategies with Amazon Kinesis Data Streams long-term retention and Amazon SageMaker

July 2023: This post was reviewed for accuracy. Real-time insight is critical when it comes to building trading strategies. Any delay in data insight can cost lot of money to the traders. Often, you need to look at historical market trends to predict future trading pattern and make the right bid. More the historical data […]

Build event-driven data quality pipelines with AWS Glue DataBrew

Businesses collect more and more data every day to drive processes like decision-making, reporting, and machine learning (ML). Before cleaning and transforming your data, you need to determine whether it’s fit for use. Incorrect, missing, or malformed data can have large impacts on downstream analytics and ML processes. Performing data quality checks helps identify issues […]

Transform data and create dashboards using AWS Glue DataBrew and Tableau

Before you can create visuals and dashboards that convey useful information, you need to transform and prepare the underlying data. With AWS Glue DataBrew, you can now easily transform and prepare datasets from Amazon Simple Storage Service (Amazon S3), an Amazon Redshift data warehouse, Amazon Aurora, and other Amazon Relational Database Service (Amazon RDS) databases […]

Define error handling for Amazon Redshift Spectrum data

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. Amazon Redshift Spectrum allows you to query open format data directly from the Amazon Simple Storage Service (Amazon S3) data lake without having to load the data into Amazon Redshift tables. With Redshift Spectrum, you can query open file formats such as […]

Set up cross-account audit logging for your Amazon Redshift cluster

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With Amazon Redshift, you can analyze all your data to derive holistic insights about your business and your customers. One of the best practices of modern application design is to have centralized logging. Troubleshooting application problems is easy when you can correlate […]

Federate access to Amazon Redshift using the JDBC browser plugin for Single Sign-on authentication with Microsoft Azure Active Directory

Since 2020, Amazon Redshift has supported multi-factor authentication (MFA) to any SAML 2.0 compliant identity provider (IdP) in our JDBC and ODBC drivers. You can map the IdP user identity and group memberships in order to control authorization for database objects in Amazon Redshift. This simplifies administration by enabling you to manage user access in […]

Optimize your analytical workloads using the automatic query rewrite feature of Amazon Redshift materialized views

Amazon Redshift materialized views enable you to significantly improve performance of complex queries that are frequently run as part of your extract, load, and transform (ELT), business intelligence (BI), or dashboarding applications. Materialized views precompute and store the result sets of the SQL query in the view definition. Materialized views speed up data access, because […]

How Belcorp decreased cost and improved reliability in its big data processing framework using Amazon EMR managed scaling

This is a guest post by Diego Benavides and Luis Bendezú, Senior Data Architects, Data Architecture Direction at Belcorp. Belcorp is one of the main consumer packaged goods (CPG) companies providing cosmetics products in the region for more than 50 years, allocated to around 13 countries in North, Central, and South America (AMER). Born in Peru […]

Power highly resilient use cases with Amazon Redshift

Amazon Redshift is the most popular and fastest cloud data warehouse, offering seamless integration with your data lake and other data sources, up to three times faster performance than any other cloud data warehouse, automated maintenance, separation of storage and compute, and up to 75% lower cost than any other cloud data warehouse. This post […]