AWS Big Data Blog
Seven Tips for Using S3DistCp on Amazon EMR to Move Data Efficiently Between HDFS and Amazon S3
Although it’s common for Amazon EMR customers to process data directly in Amazon S3, there are occasions where you might want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your Amazon EMR cluster. Additionally, you might have a use case that requires moving large amounts of data between buckets or regions. In these use cases, large datasets are too big for a simple copy operation.
Read MoreAmazon QuickSight Adds Support for Amazon Redshift Spectrum
In April, we announced Amazon Redshift Spectrum in the AWS Blog. Redshift Spectrum is a new feature of Amazon Redshift that allows you to run complex SQL queries against exabytes of data in Amazon without having to load and transform any data. We’re happy to announce that Amazon QuickSight now supports Redshift Spectrum. Starting today, […]
Read MoreNew AWS Certification Specialty Exam for Big Data
AWS Certifications validate technical knowledge with an industry-recognized credential. Today, the AWS Certification team released the AWS Certified Big Data – Specialty exam. This new exam validates technical skills and experience in designing and implementing AWS services to derive value from data. The exam requires a current Associate AWS Certification and is intended for individuals […]
Read MoreBuild a Serverless Architecture to Analyze Amazon CloudFront Access Logs Using AWS Lambda, Amazon Athena, and Amazon Kinesis Analytics
Nowadays, it’s common for a web server to be fronted by a global content delivery service, like Amazon CloudFront. This type of front end accelerates delivery of websites, APIs, media content, and other web assets to provide a better experience to users across the globe. The insights gained by analysis of Amazon CloudFront access logs […]
Read MoreAmazon QuickSight Now Supports Federated Single Sign-On Using SAML 2.0
Since launch, Amazon QuickSight has enabled business users to quickly and easily analyze data from a wide variety of data sources with superfast visualization capabilities enabled by SPICE (Superfast, Parallel, In-memory Calculation Engine). When setting up Amazon QuickSight access for business users, administrators have a choice of authentication mechanisms. These include Amazon QuickSight–specific credentials, AWS […]
Read MoreBuild a Visualization and Monitoring Dashboard for IoT Data with Amazon Kinesis Analytics and Amazon QuickSight
Customers across the world are increasingly building innovative Internet of Things (IoT) workloads on AWS. With AWS, they can handle the constant stream of data coming from millions of new, internet-connected devices. This data can be a valuable source of information if it can be processed, analyzed, and visualized quickly in a scalable, cost-efficient manner. […]
Read MoreBuild a Healthcare Data Warehouse Using Amazon EMR, Amazon Redshift, AWS Lambda, and OMOP
In the healthcare field, data comes in all shapes and sizes. Despite efforts to standardize terminology, some concepts (e.g., blood glucose) are still often depicted in different ways. This post demonstrates how to convert an openly available dataset called MIMIC-III, which consists of de-identified medical data for about 40,000 patients, into an open source data […]
Read MoreTest Your Streaming Data Solution with the New Amazon Kinesis Data Generator
When building a streaming data solution, most customers want to test it with data that is similar to their production data. Creating this data and streaming it to your solution can often be the most tedious task in testing the solution. Amazon Kinesis Streams and Amazon Kinesis Firehose enable you to continuously capture and store […]
Read MoreAWS Big Data Blog Month in Review: April 2017
Another month of big data solutions on the Big Data Blog. Please take a look at our summaries below and learn, comment, and share. Thank you for reading! NEW POSTS Amazon QuickSight Spring Announcement: KPI Charts, Export to CSV, AD Connector, and More! In this blog post, we share a number of new features and […]
Read MoreTips for Migrating to Apache HBase on Amazon S3 from HDFS
Starting with Amazon EMR 5.2.0, you have the option to run Apache HBase on Amazon S3. Running HBase on S3 gives you several added benefits, including lower costs, data durability, and easier scalability. HBase provides several options that you can use to migrate and back up HBase tables. The steps to migrate to HBase on […]
Read More