AWS Big Data Blog

Category: Analytics

Analyzing Amazon S3 server access logs using Amazon ES

When you use Amazon Simple Storage Service (Amazon S3) to store corporate data and host websites, you need additional logging to monitor access to your data and the performance of your application. An effective logging solution enhances security and improves the detection of security incidents. With the advent of increased data storage needs, you can […]

Read More

Implementing LDAP authentication for Hive on a multi-tenant Amazon EMR cluster

As Amazon EMR continues its widespread adoption, it’s important to enforce separation of duties using role-based access when submitting your hive jobs on EMR clusters in multi-tenant environments. In this post, we walk through the steps to set up authentication for Hive using Lightweight Directory Access Protocol (LDAP) and Microsoft Active Directory Domain Controller. Solution […]

Read More

Enhanced monitoring and automatic scaling for Apache Flink

Thousands of developers use Apache Flink to build streaming applications to transform and analyze data in real time. Apache Flink is an open-source framework and engine for processing data streams. It’s highly available and scalable, delivering high throughput and low latency for the most demanding stream-processing applications. Monitoring and scaling your applications is critical to […]

Read More

Stream CDC into an Amazon S3 data lake in Parquet format with AWS DMS

Most organizations generate data in real time and ever-increasing volumes. Data is captured from a variety of sources, such as transactional and reporting databases, application logs, customer-facing websites, and external feeds. Companies want to capture, transform, and analyze this time-sensitive data to improve customer experiences, increase efficiency, and drive innovations. With increased data volume and […]

Read More

Amazon EMR supports Apache Hive ACID transactions

Apache Hive is an open-source data warehouse package that runs on top of an Apache Hadoop cluster. You can use Hive for batch processing and large-scale data analysis. Hive uses Hive Query Language (HiveQL), which is similar to SQL. ACID (atomicity, consistency, isolation, and durability) properties make sure that the transactions in a database are […]

Read More

Zoopla drives KPIs with centralized data using Fivetran ELT for Amazon Redshift

This is a guest post by Steven Collings, Senior Data Consultant at Zoopla Zoopla is a property website that enables users to find residential or commercial property to buy or rent in the UK and overseas. Since acquiring Property Software Group and Expert Agent, we also offer a backend software that agents can use to […]

Read More

Fast and predictable performance with serverless compilation using Amazon Redshift

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Customers tell us that they want extremely fast query response times so they can make equally fast decisions. This post presents the recently launched, […]

Read More

Power data analytics, monitoring, and search use cases with the Open Distro for Elasticsearch SQL Engine on Amazon ES

Amazon Elasticsearch Service (Amazon ES) is a popular choice for log analytics, search, real-time application monitoring, clickstream analysis, and more. One commonality among these use cases is the need to write and run queries to obtain search results at lightning speed. However, doing so requires expertise in the JSON-based Elasticsearch query domain-specific language (Query DSL). […]

Read More

How Aruba Networks built a cost analysis solution using AWS Glue, Amazon Redshift, and Amazon QuickSight

This is a guest post co-written by Siddharth Thacker and Swatishree Sahu from Aruba Networks. Aruba Networks is a Silicon Valley company based in Santa Clara that was founded in 2002 by Keerti Melkote and Pankaj Manglik. Aruba is the industry leader in wired, wireless, and network security solutions. Hewlett-Packard acquired Aruba in 2015, making […]

Read More