AWS Big Data Blog
Predict Billboard Top 10 Hits Using RStudio, H2O and Amazon Athena
In this walkthrough, you leverage H2O.ai, Amazon Athena, and RStudio to make predictions on whether a song might make it to the Top 10 Billboard charts. You explore the GLM, GBM, and deep learning modeling techniques using H2O’s rapid, distributed and easy-to-use open source parallel processing engine.
Preprocessing Data in Amazon Kinesis Analytics with AWS Lambda
Kinesis Analytics now gives you the option to preprocess your data with AWS Lambda. This gives you a great deal of flexibility in defining what data gets analyzed by your Kinesis Analytics application. In this post, I discuss some common use cases for preprocessing, and walk you through an example to help highlight its applicability.
Build a Schema-On-Read Analytics Pipeline Using Amazon Athena
In this post, I show how to build a schema-on-read analytical pipeline, similar to the one used with relational databases, using Amazon Athena. The approach is completely serverless, which allows the analytical platform to scale as more data is stored and processed via the pipeline.
Amazon QuickSight Now Allows Users to Create Analyses from Dashboards and Import Custom Date Formats
Starting today, QuickSight will allow users to save the contents of a dashboard as an analysis within their account. As the user of a dashboard, this will allow you to create an analysis that contains all visuals from the dashboard.
Query and Visualize AWS Cost and Usage Data Using Amazon Athena and Amazon QuickSight
If you’ve ever wondered if a serverless alternative existed for consuming and querying your AWS Cost and Usage report data, then wonder no more. The answer is yes, and this post both introduces you to that solution and illustrates the simplicity and effortlessness of deploying it.
Create Custom AMIs and Push Updates to a Running Amazon EMR Cluster Using Amazon EC2 Systems Manager
In this post, I show how Systems Manager Automation can be used to automate the creation and patching of custom Amazon Linux AMIs for EMR. I also show how you can use Run Command to send commands to all nodes of a running EMR cluster.
Unite Real-Time and Batch Analytics Using the Big Data Lambda Architecture, Without Servers!
In this post, I show you how you can use AWS services like AWS Glue to build a Lambda Architecture completely without servers. I use a practical demonstration to examine the tight integration between serverless services on AWS and create a robust data processing Lambda Architecture system.
Implement Continuous Integration and Delivery of Apache Spark Applications using AWS
In this post, we walk you through a solution that implements a continuous integration and deployment pipeline supported by AWS services. You can use the sample template and Spark application shared in this post and adapt them for the specific needs of your own application.
Amazon QuickSight Now Supports Search, Filter Groups, and Amazon S3 Analytics Connector
I’m excited to share information about some new features in Amazon QuickSight. You can now search for datasets, analyses, and dashboards, you can create filter groups with multiple filter conditions that are evaluated together using the OR operation, and you can now use the built-in Amazon S3 analytics connector to visualize your S3 storage access patterns across multiple S3 buckets and configurations within a single Amazon QuickSight dashboard to optimize for cost.
Analyzing Salesforce Data with Amazon QuickSight
In this post, we will walk through creating a new data set based on Salesforce data, creating your analysis and adding visuals, creating an Amazon QuickSight dashboard, and working with filters.









