AWS Big Data Blog
Presto-Amazon Kinesis Connector for Interactively Querying Streaming Data
This is a guest post by Sivaramakrishnan Narayanan, Member of Technical Staff at Qubole, and Xing Quan, Director of Product Management at Qubole. Qubole is an AWS Advanced Technology Partner. Amazon Kinesis is a scalable and fully managed service for streaming large, distributed data sets. As applications (particularly on mobile and wearable devices) start to […]
Read MoreBuilding Scalable and Responsive Big Data Interfaces with AWS Lambda
This is a guest post by Martin Holste, a co-founder of the Threat Analytics Platform at FireEye where he is a senior researcher specializing in prototypes. Overview At FireEye, Inc., we process billions of security events every day with our Threat Analytics Platform, running on AWS. In building our platform, one of the problems we […]
Read MoreHow Expedia Implemented Near Real-time Analysis of Interdependent Datasets
This is a guest post by Stephen Verstraete, a manager at Pariveda Solutions. Pariveda Solutions is an AWS Premier Consulting Partner. Common patterns exist for batch processing and real-time processing of Big Data. However, we haven’t seen patterns that allow us to process batches of dependent data in real-time. Expedia’s marketing group needed to analyze […]
Read MoreApplying Machine Learning to Text Mining with Amazon S3 and RapidMiner
Gopal Wunnava is a Senior Consultant with AWS Professional Services By some estimates, 80% of an organization’s data is unstructured content. This content includes web pages, call center transcripts, surveys, feedback forms, legal documents, forums, social media, and blog articles. Therefore, organizations must analyze not just transactional information but also textual content to gain insight […]
Read MoreLarge-Scale Machine Learning with Spark on Amazon EMR
This is a guest post by Jeff Smith, Data Engineer at Intent Media. Intent Media, in their own words: “Intent Media operates a platform for advertising on commerce sites. We help online travel companies optimize revenue on their websites and apps through sophisticated data science capabilities. On the data team at Intent Media, we are […]
Read MoreBuilding a Binary Classification Model with Amazon Machine Learning and Amazon Redshift
Guy Ernest is a Solutions Architect with AWS This post builds on Guy’s earlier posts Building a Numeric Regression Model with Amazon Machine Learning and Building a Multi-Class ML Model with Amazon Machine Learning. Many decisions in life are binary, answered either Yes or No. Many business problems also have binary answers. For example: “Is […]
Read MoreTest drive two big data scenarios from the ‘Building a Big Data Platform on AWS’ bootcamp
Matt Yanchyshyn is a Sr. Manager for AWS Solutions Architecture AWS offers a number of events during the year such as our annual AWS re:Invent conference, the AWS Summit series, the AWS Pop-up Loft, and a variety of roadshows. All of these provide opportunities for AWS customers to attend talks focused on big data and […]
Read MoreIndexing Common Crawl Metadata on Amazon EMR Using Cascading and Elasticsearch
Hernan Vivani is a Big Data Support Engineer for Amazon Web Services A previous post showed you how to get started with Elasticsearch and Kibana on Amazon EMR. In that post, we installed Elasticsearch and Kibana on an Amazon EMR cluster using bootstrap actions. This post shows you how to build a simple application with […]
Read MoreBuilding a Multi-Class ML Model with Amazon Machine Learning
Guy Ernest is a Solutions Architect with AWS This post builds on our earlier post Building a Numeric Regression Model with Amazon Machine Learning. We often need to assign an object (product, article, or customer) to its class (product category, article topic or type, or customer segment). For example, which category of products is most […]
Read MoreOptimizing for Star Schemas and Interleaved Sorting on Amazon Redshift
Chris Keyser is a Solutions Architect for AWS Many organizations implement star and snowflake schema data warehouse designs and many BI tools are optimized to work with dimensions, facts, and measure groups. Customers have moved data warehouses of all types to Amazon Redshift with great success. The Amazon Redshift team has released support for interleaved […]
Read More