AWS Big Data Blog
Integrating Splunk with Amazon Kinesis Streams
Prahlad Rao is a Solutions Architect wih AWS It is important to not only be able to stream and ingest terabytes of data at scale, but to quickly get insights and visualize data using available tools and technologies. The Amazon Kinesis platform of managed services enables continuous capture and stores terabytes of data per hour from […]
Using AWS Lambda for Event-driven Data Processing Pipelines
February 2023 Update: Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that […]
Building a Graph Database on AWS Using Amazon DynamoDB and Titan
At AWS re:Invent 2017, we announced the preview of Amazon Neptune, a fast and reliable graph database built for the cloud. Though this blog post still shows the benefits a graph database can deliver for certain use cases, if you are about to build an application yourself and need a graph database, you should first […]
Videos now available for AWS re:Invent 2015 Big Data Analytics sessions
For those of you who were able to attend AWS re:Invent 2015 last week or watched sessions through our live stream, thanks for participating in the conference. We hope you left feeling inspired to tackle your big data projects with tools in the AWS ecosystem and partner solutions. Also, we were excited for our customers […]
Persist Streaming Data to Amazon S3 using Amazon Data Firehose and AWS Lambda
February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. Streaming data analytics is becoming main-stream (pun intended) in large enterprises as the technology stacks have become more user-friendly to implement. For example, Spark-Streaming connected to an Amazon Kinesis stream is a […]
Automating Analytic Workflows on AWS
February 2023 Update: Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that […]
Analyze Data with Presto and Airpal on Amazon EMR
Songzhi Liu is a Professional Services Consultant with AWS You can now launch Presto version 0.119 on Amazon EMR, allowing you to easily spin up a managed EMR cluster with the Presto query engine and run interactive analysis on data stored in Amazon S3. You can integrate with Spot instances, publish logs to an S3 […]
AWS Big Data Analytics Sessions at re:Invent 2015
Roy Ben-Alta is a Business Development Manager – Big Data & Analytics If you will be attending re:Invent 2015 in Las Vegas next week, you know that you’ll have many opportunities to learn more about Big Data & Analytics on AWS at the conference–and this year we have over 20 sessions! The following breakout sessions compose this […]
How Coursera Manages Large-Scale ETL using AWS Data Pipeline and Dataduct
February 2023 Update: Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that […]
Scaling Writes on Amazon DynamoDB Tables with Global Secondary Indexes
Ian Meyers is a Solutions Architecture Senior Manager with AWS Amazon DynamoDB is a fast, flexible, and fully managed NoSQL database service that supports both document and key-value store models that need consistent, single-digit millisecond latency at any scale. In this post, we discuss a technique that can be used with DynamoDB to ensure virtually […]