AWS Big Data Blog
Big Data Website Gets a Big Makeover at AWS
Jorge A. Lopez is responsible for Big Data Solutions Marketing at AWS The big data ecosystem is evolving at a tremendous pace, giving rise to a plethora of tools, use cases, and applications. The new AWS Big Data website is now the ideal starting point to learn about new and existing capabilities, and the services […]
Analyze Your Data on Amazon DynamoDB with Apache Spark
Manjeet Chayel is a Solutions Architect with AWS Every day, tons of customer data is generated, such as website logs, gaming data, advertising data, and streaming videos. Many companies capture this information as it’s generated and process it in real time to understand their customers. Amazon DynamoDB is a fast and flexible NoSQL database service […]
Month in Review: February 2016
Lots for big data enthusiasts in February on the AWS Big Data Blog. Take a look! Submitting User Applications with spark-submit Learn how to set spark-submit flags to control the memory and compute resources available to your application submitted to Spark running on EMR. Learn when to use the maximizeResourceAllocation configuration option and dynamic allocation […]
Optimize Spark-Streaming to Efficiently Process Amazon Kinesis Streams
Rahul Bhartia is a Solutions Architect with AWS Martin Schade, a Solutions Architect with AWS, also contributed to this post. Do you use real-time analytics on AWS to quickly extract value from large volumes of data streams? For example, have you built a recommendation engine on clickstream data to personalize content suggestions in real time […]
Introducing On-Demand Pipeline Execution in AWS Data Pipeline
February 2023 Update: Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that […]
Join us at the AWS Big Data Meetup on February 24th in Palo Alto
Join and RSVP! Guest Speaker: Cory Dolphin from Twitter Learn about how Answers, Fabric’s realtime analytics product, which processes billions of events in realtime, using Twitter’s new stream processing engine, Heron. Cory will explain some of the challenges the team faced while scaling Storm, and how Heron has helped them fly faster. Specifically, Cory will describe how Heron’s […]
Process Amazon Kinesis Aggregated Data with AWS Lambda
Ian Meyers is a Solutions Architecture Senior Manager with AWS Last year, we introduced the Amazon Kinesis Producer Library (KPL) to simplify the development of applications that need to send data to Amazon Kinesis Streams. Many customers use aggregation, which allows you to send multiple records to a single Amazon Kinesis Streams record. Although the […]
Big Data Analytics Options on AWS: Updated White Paper
February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. Erik Swensson is an Enterprise Solutions Architect Manager for AWS The big data ecosystem is growing quickly. Many AWS services have recently been added, such as AWS Lambda, Amazon OpenSearch Service, Amazon […]
Amazon Redshift UDF repository on AWSLabs
Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services Zach Christopherson, an Amazon Redshift Database Engineer, contributed to this post Did you ever have a need for complex string parsing in Amazon Redshift and wish you could simply add f_parse_url_query_string(url) to your SQL query? Have you ever tried to weigh which would be less […]
Submitting User Applications with spark-submit
Francisco Oliveira is a consultant with AWS Professional Services Customers starting their big data journey often ask for guidelines on how to submit user applications to Spark running on Amazon EMR. For example, customers ask for guidelines on how to size memory and compute resources available to their applications and the best resource allocation model […]
