AWS Big Data Blog

Category: Analytics

Stream data to an HTTP endpoint with Amazon Kinesis Data Firehose

The value of data is time sensitive. Streaming data services can help you move data quickly from data sources to new destinations for downstream processing. For example, Amazon Kinesis Data Firehose can reliably load streaming data into data stores like Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), and Splunk. […]

Read More

Manage and control your cost with Amazon Redshift Concurrency Scaling and Spectrum

This post shares the simple steps you can take to use the new Amazon Redshift usage controls feature to monitor and control your usage and associated cost for Amazon Redshift Spectrum and Concurrency Scaling features. Redshift Spectrum enables you to power a lake house architecture to directly query and join data across your data warehouse and data lake, and Concurrency Scaling enables you to support thousands of concurrent users and queries with consistently fast query performance.

Read More

Federate access to your Amazon Redshift cluster with Active Directory Federation Services (AD FS): Part 2

In the first post of this series, Federating access to your Amazon Redshift cluster with Active Directory: Part 1, you set up Microsoft Active Directory Federation Services (AD FS) and Security Assertion Markup Language (SAML) based authentication and tested the SAML federation using a web browser. In Part 2, you learn to set up an […]

Read More

Federate access to your Amazon Redshift cluster with Active Directory Federation Services (AD FS): Part 1

Many customers request detailed steps to set up federated single sign-on (SSO) using Microsoft Active Directory Federation Services (AD FS) for Amazon Redshift. In this two-part series, you will find detailed steps to achieve federated SSO using AD FS. Part 1 contains steps to set up a Windows 2016 domain controller and AD FS and […]

Read More

Enable fine-grained data access in Zeppelin Notebook with AWS Lake Formation

This post explores how you can use AWS Lake Formation integration with Amazon EMR (still in beta) to implement fine-grained column-level access controls while using Spark in a Zeppelin Notebook. My previous post Extract Salesforce.com data using AWS Glue and analyzing with Amazon Athena showed you a simple use case for extracting any Salesforce object data using AWS Glue and Apache Spark, saving it to Amazon Simple Storage Service (Amazon S3), cataloging the data using the Data Catalog in Glue, and querying it using Amazon Athena.

Read More

Improving RAPIDS XGBoost performance and reducing costs with Amazon EMR running Amazon EC2 G4 instances

This is a guest post by Kong Zhao, Solution Architect at NVIDIA Corporation This post shares how NVIDIA sped up RAPIDS XGBoost performance up to 4.5 times faster and reduced costs up to 5.4 times less by using Amazon EMR running Amazon Elastic Compute Cloud (Amazon EC2) G4 instances. Gradient boosting is a powerful machine […]

Read More

Best practices from Delhivery on migrating from Apache Kafka to Amazon MSK

This is a guest post by Delhivery. In this post, we describe the steps Delhivery took to migrate from self-managed Apache Kafka running on Amazon Elastic Compute Cloud (Amazon EC2) to Amazon Managed Streaming for Apache Kafka (Amazon MSK). “We’ve been in production for over a year now,” said Akash Deep Verma, Senior Technical Architect, […]

Read More

Control data access and permissions with AWS Lake Formation and Amazon EMR

What if you could control the access to your data lake centrally? Would it be more convenient to share specific data securely with internal and external customers? With AWS Lake Formation and its integration with Amazon EMR, you can easily perform these administrative tasks. This post goes through a use case and reviews the steps to control the data access and permissions of your existing data lake.

Read More

Develop an application migration methodology to modernize your data warehouse with Amazon Redshift

This post demonstrates how to develop a comprehensive, wave-based application migration methodology for a complex project to modernize a traditional MPP data warehouse with Amazon Redshift. It provides best practices and lessons learned by considering business priority, data dependency, workload profiles and existing service level agreements (SLAs).

Read More

Simplifying and modernizing home search at Compass with Amazon Elasticsearch Service

In this post, we learn how Compass’s search solution evolved, what challenges and benefits they found with different architectures, and how Amazon ES gives them a long-term scalable solution. We also see how Amazon Managed Streaming for Apache Kafka (Amazon MSK) helped create event-driven, real-time streaming capabilities of property listing data. You can apply this solution to similar use cases.

Read More