AWS Big Data Blog

Integrating Splunk with Amazon Kinesis Streams

Prahlad Rao is a Solutions Architect wih AWS It is important to not only be able to stream and ingest terabytes of data at scale, but to quickly get insights and visualize data using available tools and technologies. The Amazon Kinesis platform of managed services enables continuous capture and stores terabytes of data per hour from […]

Read More

Using AWS Lambda for Event-driven Data Processing Pipelines

awVadim Astakhov is a Solutions Architect with AWS Some big data customers want to analyze new data in response to a specific event, and they might already have well-defined pipelines to perform batch processing, orchestrated by AWS Data Pipeline. One example of event-triggered pipelines is when data analysts must analyze data as soon as it […]

Read More

Building a Graph Database on AWS Using Amazon DynamoDB and Titan

Nick Corbett is a Big Data Consultant for AWS Professional Services You might not know it, but a graph has changed your life. A bold claim perhaps, but companies such as Facebook, LinkedIn, and Twitter have revolutionized the way society interacts through their ability to manage a huge network of relationships. However, graphs aren’t just […]

Read More

Videos now available for AWS re:Invent 2015 Big Data Analytics sessions

by Jonathan Fritz | on | Permalink | Comments |  Share

For those of you who were able to attend AWS re:Invent 2015 last week or watched sessions through our live stream, thanks for participating in the conference. We hope you left feeling inspired to tackle your big data projects with tools in the AWS ecosystem and partner solutions. Also, we were excited for our customers […]

Read More

Persist Streaming Data to Amazon S3 using Amazon Kinesis Firehose and AWS Lambda

Derek Graeber is a Senior Consultant in Big Data Analytics for AWS Professional Services Streaming data analytics is becoming main-stream (pun intended) in large enterprises as the technology stacks have become more user-friendly to implement. For example, Spark-Streaming connected to an Amazon Kinesis stream is a typical model for real-time analytics. But one area that […]

Read More

Automating Analytic Workflows on AWS

Wangechi Doble is a Solutions Architect with AWS Organizations are experiencing a proliferation of data. This data includes logs, sensor data, social media data, and transactional data, and resides in the cloud, on premises, or as high-volume, real-time data feeds. It is increasingly important to analyze this data: stakeholders want information that is timely, accurate, […]

Read More

Analyze Data with Presto and Airpal on Amazon EMR

Songzhi Liu is a Professional Services Consultant with AWS You can now launch Presto version 0.119 on Amazon EMR, allowing you to easily spin up a managed EMR cluster with the Presto query engine and run interactive analysis on data stored in Amazon S3. You can integrate with Spot instances, publish logs to an S3 […]

Read More

AWS Big Data Analytics Sessions at re:Invent 2015

by Roy Ben-Alta | on | Permalink | Comments |  Share

Roy Ben-Alta is a Business Development Manager – Big Data & Analytics If you will be attending re:Invent 2015 in Las Vegas next week, you know that you’ll have many opportunities to learn more about Big Data & Analytics on AWS at the conference–and this year we have over 20 sessions! The following breakout sessions compose this […]

Read More

How Coursera Manages Large-Scale ETL using AWS Data Pipeline and Dataduct

This is a guest post by Sourabh Bajaj, a Software Engineer at Coursera. Coursera in their own words: “Coursera is an online educational startup with over 14 million learners across the globe. We offer more than 1000 courses from over 120 top universities.” At Coursera, we use Amazon Redshift as our primary data warehouse because […]

Read More

Scaling Writes on Amazon DynamoDB Tables with Global Secondary Indexes

Ian Meyers is a Solutions Architecture Senior Manager with AWS Amazon DynamoDB is a fast, flexible, and fully managed NoSQL database service that supports both document and key-value store models that need consistent, single-digit millisecond latency at any scale. In this post, we discuss a technique that can be used with DynamoDB to ensure virtually […]

Read More