AWS Big Data Blog
Building Event-Driven Batch Analytics on AWS
In this post, I walk you through an architectural approach as well as a sample implementation on how to collect, process, and analyze data for event-driven applications in AWS.
Read MoreMonth in Review: September 2016
Another month of big data solutions on the Big Data Blog. Take a look at our summaries below and learn, comment, and share. Thanks for reading! Processing VPC Flow Logs with Amazon EMR In this post, learn how to gain valuable insight into your network by using Amazon EMR and Amazon VPC Flow Logs. The […]
Read MoreReal-time Stream Processing Using Apache Spark Streaming and Apache Kafka on AWS
This post demonstrates how to set up Apache Kafka on EC2, use Spark Streaming on EMR to process data coming in to Apache Kafka topics, and query streaming data using Spark SQL on EMR.
Read MoreJoin us This Week at Strata + Hadoop World in New York City
Get technical details and best practices from AWS experts. Hear directly from customers and learn from the experience of other organizations that are deploying big data solutions on AWS.
Read MoreAmazon EMR-DynamoDB Connector Repository on AWSLabs GitHub
Amazon Web Services is excited to announce that the Amazon EMR-DynamoDB Connector is now open-source. The code you see in the GitHub repository is exactly what is available on your EMR cluster, making it easier to build applications with this component.
Read MoreEncrypt Data At-Rest and In-Flight on Amazon EMR with Security Configurations
ustomers running analytics, stream processing, machine learning, and ETL workloads on personally identifiable information, health information, and financial data have strict requirements for encryption of data at-rest and in-transit. The Apache Spark and Hadoop ecosystems lend themselves to these big data use cases, and customers have asked us to provide a quick and easy way to encrypt data at-rest and data in-transit between nodes in each execution framework.
Read MoreReal-time Clickstream Anomaly Detection with Amazon Kinesis Analytics
In this post, I show an analytics pipeline which detects anomalies in real time for a web traffic stream, using the RANDOM_CUT_FOREST function available in Amazon Kinesis Analytics.
Read MoreWriting SQL on Streaming Data with Amazon Kinesis Analytics – Part 2
This post introduces you to the different types of windows supported by Amazon Kinesis Analytics, the importance of time as it relates to stream data processing, and best practices for sending your SQL results to a configured destination.
Read MoreMonth in Review: August 2016
Another month of big data solutions on the Big Data Blog. Take a look at our summaries below and learn, comment, and share. Thanks for reading! Readmission Prediction Through Patient Risk Stratification Using Amazon Machine Learning With this post, learn how to apply advanced analytics concepts like pattern analysis and machine learning to do risk […]
Read MoreIntegrating IoT Events into Your Analytic Platform
AWS IoT makes it easy to integrate and control your devices from other AWS services for even more powerful IoT applications. In particular, IoT provides tight integration with AWS Lambda, Amazon Kinesis, Amazon S3, Amazon Machine Learning, Amazon DynamoDB, Amazon CloudWatch, and Amazon OpenSearch Service.
Read More