AWS Big Data Blog

Category: Best Practices

BDB-2071-Virtual_key_2

New features from Apache Hudi 0.9.0 on Amazon EMR

Apache Hudi is an open-source transactional data lake framework that greatly simplifies incremental data processing and data pipeline development. It does this by providing transaction support and record-level insert, update, and delete capabilities on data lakes on Amazon Simple Storage Service (Amazon S3) or Apache HDFS. Apache Hudi is integrated with open-source big data analytics […]

Read More

What to consider when migrating data warehouse to Amazon Redshift

Customers are migrating data warehouses to Amazon Redshift because it’s fast, scalable, and cost-effective. However, data warehouse migration projects can be complex and challenging. In this post, I help you understand the common drivers of data warehouse migration, migration strategies, and what tools and services are available to assist with your migration project. Let’s first […]

Read More

Unify log aggregation and analytics across compute platforms

Our customers want to make sure their users have the best experience running their application on AWS. To make this happen, you need to monitor and fix software problems as quickly as possible. Doing this gets challenging with the growing volume of data needing to be quickly detected, analyzed, and stored. In this post, we […]

Read More
Amazon OpenSearch Service Storage

Choose the right storage tier for your needs in Amazon OpenSearch Service

Amazon OpenSearch Service enables organizations to perform interactive log analytics, real-time application monitoring, website search, and more. OpenSearch is an open-source, distributed search and analytics suite derived from Elasticsearch. Amazon OpenSearch Service offers the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), and visualization capabilities powered by OpenSearch Dashboards […]

Read More

Cybersecurity Awareness Month: Learn about the job zero of securing your data using Amazon Redshift

Amazon Redshift is the most widely used cloud data warehouse. It allows you to run complex analytic queries against terabytes to petabytes of structured and semi-structured data, using sophisticated query optimization, columnar on high-performance storage, and massively parallel query execution. At AWS, we embrace the culture that security is job zero, by which we mean […]

Read More