AWS Big Data Blog

Setting up trust between ADFS and AWS and using Active Directory credentials to connect to Amazon Athena with ODBC driver

This post walks you through configuring ADFS 3.0 on a Windows Server 2012 R2 Amazon Elastic Compute Cloud (Amazon EC2) instance and setting up trust between ADFS 3.0 IdP and AWS through SAML 2.0. The post then demonstrates how to install the Athena OBDC driver on Amazon Linux EC2 instance (RHEL instance) and configure it to use ADFS for authentication.

Read More

Using Random Cut Forests for real-time anomaly detection in Amazon Elasticsearch Service

Anomaly detection is a rich field of machine learning. Many mathematical and statistical techniques have been used to discover outliers in data, and as a result, many algorithms have been developed for performing anomaly detection in a computational setting. In this post, we take a close look at the output and accuracy of the anomaly detection feature available in Amazon Elasticsearch Service (Amazon ES) and Open Distro for Elasticsearch, and provide insight as to why we chose Random Cut Forests (RCF) as the core anomaly detection algorithm.

Read More

Running a high-performance SAS Grid Manager cluster on AWS with Amazon FSx for Lustre

SAS® is a software provider of data science and analytics used by enterprises and government organizations. SAS Grid is a highly available, fast processing analytics platform that offers centralized management that balances workloads across different compute nodes. This application suite is capable of data management, visual analytics, governance and security, forecasting and text mining, statistical […]

Read More

Moving to managed: The case for the Amazon Elasticsearch Service

You need to factor several considerations into your decision to move to a managed service. Obviously, you want your teams focused on doing meaningful work that propels the growth of your company. Deciding what processes you offload to a managed service versus what are best self-managed can be a challenge. Based on my experience managing Elasticsearch at my prior employer, and having worked with thousands of customers who have migrated to AWS, I consider the following sections important topics for you to review.

Read More

Monitor and control the storage space of a schema with quotas with Amazon Redshift

Many organizations are moving toward self-service analytics, where different personas create their own insights on the evolved volume, variety, and velocity of data to keep up with the acceleration of business. This data democratization creates the need to enforce data governance, control cost, and prevent data mismanagement. Controlling the storage quota of different personas is a significant challenge for data governance and data storage operation. This post shows you how to set up Amazon Redshift storage quotas by different personas.

Read More

How Goldman Sachs builds cross-account connectivity to their Amazon MSK clusters with AWS PrivateLink

This guest post presents patterns for accessing an Amazon Managed Streaming for Apache Kafka cluster across your AWS account or Amazon Virtual Private Cloud (Amazon VPC) boundaries using AWS PrivateLink. In addition, the post discusses the pattern that the Transaction Banking team at Goldman Sachs (TxB) chose for their cross-account access, the reasons behind their […]

Read More

Best practices for configuring your Amazon Elasticsearch Service domain

Amazon Elasticsearch Service (Amazon ES) is a fully managed service that makes it easy to deploy, secure, scale, and monitor your Elasticsearch cluster in the AWS Cloud. Elasticsearch is a distributed database solution, which can be difficult to plan for and execute. This post discusses some best practices for deploying Amazon ES domains.

Read More