AWS Big Data Blog
Analyze Realtime Data from Amazon Kinesis Streams Using Zeppelin and Spark Streaming
This post shows you how you can use Spark Streaming to process data coming from Amazon Kinesis streams, build some graphs using Zeppelin, and then store the Zeppelin notebook in Amazon S3.
Analyze Your Data on Amazon DynamoDB with Apache Spark
Manjeet Chayel is a Solutions Architect with AWS Every day, tons of customer data is generated, such as website logs, gaming data, advertising data, and streaming videos. Many companies capture this information as it’s generated and process it in real time to understand their customers. Amazon DynamoDB is a fast and flexible NoSQL database service […]
Using IPython Notebook to Analyze Data with Amazon EMR
Manjeet Chayel is a Solutions Architect with AWS IPython Notebook is a web-based interactive environment that lets you combine code, code execution, mathematical functions, rich documentation, plots, and other elements into a single document. In the background, IPython Notebook stores this information as a JSON document. The main advantage of a notebook when compared to […]
Running Apache Accumulo on Amazon EMR
Manjeet Chayel is a Solutions Architect with Amazon Web Services This post was co-authored by Matt Yanchyshyn, a Principal Solutions Architect with Amazon Web Services Apache Accumulo is a sorted, distributed key-value store that is built on top of Apache Hadoop, Zookeeper, and Thrift. Accumulo was originally modeled after Google’s BigTable and can scale to […]
ETL Processing Using AWS Data Pipeline and Amazon Elastic MapReduce
February 2023 Update: Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that […]
