AWS Big Data Blog

Category: Intermediate (200)

Cross-account integration between SaaS platforms using Amazon AppFlow

Implementing an effective data sharing strategy that satisfies compliance and regulatory requirements is complex. Customers often need to share data between disparate software as a service (SaaS) platforms within their organization or across organizations. On many occasions, they need to apply business logic to the data received from the source SaaS platform before pushing it […]

Exploring new ETL and ELT capabilities for Amazon Redshift from the AWS Glue Studio visual editor

In a modern data architecture, unified analytics enable you to access the data you need, whether it’s stored in a data lake or a data warehouse. In particular, we have observed an increasing number of customers who combine and integrate their data into an Amazon Redshift data warehouse to analyze huge data at scale and […]

Configure SAML federation for Amazon OpenSearch Serverless with AWS IAM Identity Center

Amazon OpenSearch Serverless is a serverless option of Amazon OpenSearch Service that makes it easy for you to run large-scale search and analytics workloads without having to configure, manage, or scale OpenSearch clusters. It automatically provisions and scales the underlying resources to deliver fast data ingestion and query responses for even the most demanding and […]

Push Amazon EMR step logs from Amazon EC2 instances to Amazon CloudWatch logs

Amazon EMR is a big data service offered by AWS to run Apache Spark and other open-source applications on AWS to build scalable data pipelines in a cost-effective manner. Monitoring the logs generated from the jobs deployed on EMR clusters is essential to help detect critical issues in real time and identify root causes quickly. […]

Perform accent-insensitive search using OpenSearch

We often need our text search to be agnostic of accent marks. Accent-insensitive search, also called diacritics-agnostic search, is where search results are the same for queries that may or may not contain Latin characters such as à, è, Ê, ñ, and ç. Diacritics are English letters with an accent to mark a difference in […]

Visualize Confluent data in Amazon QuickSight using Amazon Athena

This is a guest post written by Ahmed Saef Zamzam and Geetha Anne from Confluent. Businesses are using real-time data streams to gain insights into their company’s performance and make informed, data-driven decisions faster. As real-time data has become essential for businesses, a growing number of companies are adapting their data strategy to focus on […]

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

Amazon Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. Developers, data scientists, and analysts can work across databases, data warehouses, and data lakes to build reporting and dashboarding applications, perform real-time analytics, share and collaborate on data, and even build and train machine learning (ML) […]

How AWS Payments migrated from Redash to Amazon Redshift Query Editor v2

AWS Payments is part of the AWS Commerce Platform (CP) organization that owns the customer experience of paying AWS invoices. It helps AWS customers manage their payment methods and payment preferences, and helps customers make self-service payments to AWS. The Machine Learning, Data and Analytics (MLDA) team at AWS Payments enables data-driven decision-making across payments […]

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

In the first post of this series, we described how AWS Glue for Apache Spark works with Apache Hudi, Linux Foundation Delta Lake, and Apache Iceberg datasets tables using the native support of those data lake formats. This native support simplifies reading and writing your data for these data lake frameworks so you can more […]

Extend geospatial queries in Amazon Athena with UDFs and AWS Lambda

Amazon Athena is a serverless and interactive query service that allows you to easily analyze data in Amazon Simple Storage Service (Amazon S3) and 25-plus data sources, including on-premises data sources or other cloud systems using SQL or Python. Athena built-in capabilities include querying for geospatial data; for example, you can count the number of […]