AWS Big Data Blog
Category: AWS Big Data
Month in Review: February 2016
Lots for big data enthusiasts in February on the AWS Big Data Blog. Take a look! Submitting User Applications with spark-submit Learn how to set spark-submit flags to control the memory and compute resources available to your application submitted to Spark running on EMR. Learn when to use the maximizeResourceAllocation configuration option and dynamic allocation […]
Optimize Spark-Streaming to Efficiently Process Amazon Kinesis Streams
Rahul Bhartia is a Solutions Architect with AWS Martin Schade, a Solutions Architect with AWS, also contributed to this post. Do you use real-time analytics on AWS to quickly extract value from large volumes of data streams? For example, have you built a recommendation engine on clickstream data to personalize content suggestions in real time […]
Introducing On-Demand Pipeline Execution in AWS Data Pipeline
February 2023 Update: Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that […]
Join us at the AWS Big Data Meetup on February 24th in Palo Alto
Join and RSVP! Guest Speaker: Cory Dolphin from Twitter Learn about how Answers, Fabric’s realtime analytics product, which processes billions of events in realtime, using Twitter’s new stream processing engine, Heron. Cory will explain some of the challenges the team faced while scaling Storm, and how Heron has helped them fly faster. Specifically, Cory will describe how Heron’s […]
Process Amazon Kinesis Aggregated Data with AWS Lambda
Ian Meyers is a Solutions Architecture Senior Manager with AWS Last year, we introduced the Amazon Kinesis Producer Library (KPL) to simplify the development of applications that need to send data to Amazon Kinesis Streams. Many customers use aggregation, which allows you to send multiple records to a single Amazon Kinesis Streams record. Although the […]
Big Data Analytics Options on AWS: Updated White Paper
February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. Erik Swensson is an Enterprise Solutions Architect Manager for AWS The big data ecosystem is growing quickly. Many AWS services have recently been added, such as AWS Lambda, Amazon OpenSearch Service, Amazon […]
Amazon Redshift UDF repository on AWSLabs
Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services Zach Christopherson, an Amazon Redshift Database Engineer, contributed to this post Did you ever have a need for complex string parsing in Amazon Redshift and wish you could simply add f_parse_url_query_string(url) to your SQL query? Have you ever tried to weigh which would be less […]
Submitting User Applications with spark-submit
Francisco Oliveira is a consultant with AWS Professional Services Customers starting their big data journey often ask for guidelines on how to submit user applications to Spark running on Amazon EMR. For example, customers ask for guidelines on how to size memory and compute resources available to their applications and the best resource allocation model […]
Month in Review (January 2016)
Lots for big data enthusiasts in January on the AWS Big Data Blog. Take a look! Running an External Zeppelin Instance using S3 Backed Notebooks with Spark on Amazon EMR Learn how to set up Zeppelin running “off-cluster” on a separate EC2 instance. You’ll be able to submit Spark jobs to an EMR cluster directly […]
Turning Amazon EMR into a Massive Amazon S3 Processing Engine with Campanile
Michael Wallman is a senior consultant with AWS ProServ Have you ever had to copy a huge Amazon S3 bucket to another account or region? Or create a list based on object name or size? How about mapping a function over millions of objects? Amazon EMR to the rescue! EMR allows you to deploy large […]