AWS Big Data Blog

Amazon Redshift Engineering’s Advanced Table Design Playbook: Distribution Styles and Distribution Keys

  Part 1: Preamble, Prerequisites, and Prioritization Part 2: Distribution Styles and Distribution Keys (Translated into Japanese) Part 3: Compound and Interleaved Sort Keys Part 4: Compression Encodings Part 5: Table Data Durability The first table and column properties we discuss in this blog series are table distribution styles (DISTSTYLE) and distribution keys (DISTKEY). This blog […]

Read More

Amazon Redshift Engineering’s Advanced Table Design Playbook: Preamble, Prerequisites, and Prioritization

  Part 1: Preamble, Prerequisites, and Prioritization (Translated into Japanese) Part 2: Distribution Styles and Distribution Keys Part 3: Compound and Interleaved Sort Keys Part 4: Compression Encodings Part 5: Table Data Durability Amazon Redshift is a fully managed, petabyte scale, massively parallel data warehouse that offers simple operations and high performance. AWS customers use Amazon […]

Read More

Implementing Authorization and Auditing using Apache Ranger on Amazon EMR

Role-based access control (RBAC) is an important security requirement for multi-tenant Hadoop clusters. Enforcing this across always-on and transient clusters can be hard to set up and maintain. Imagine an organization that has an RBAC matrix using Active Directory users and groups. They would like to manage it on a central security policy server and […]

Read More

Analyzing Data in S3 using Amazon Athena

Neil Mukerje is a Solution Architect for Amazon Web Services Abhishek Sinha is a Senior Product Manager on Amazon Athena Amazon Athena is an interactive query service that makes it easy to analyze data directly from Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to set up or manage and you can […]

Read More

Introducing the Data Lake Solution on AWS

NOTE: The solution in this post is in the process of being updated. For the most current information, please visit the What is a data lake? page. This blog post has been translated into Japanese. Many of our customers choose to build their data lake on AWS. They find the flexible, pay-as-you-go, cloud model is […]

Read More

Low-Latency Access on Trillions of Records: FINRA’s Architecture Using Apache HBase on Amazon EMR with Amazon S3

John Hitchingham is Director of Performance Engineering at FINRA The Financial Industry Regulatory Authority (FINRA) is a private sector regulator responsible for analyzing 99% of the equities and 65% of the option activity in the US. In order to look for fraud, market manipulation, insider trading, and abuse, FINRA’s technology group has developed a robust […]

Read More

Dynamically Scale Applications on Amazon EMR with Auto Scaling

Jonathan Fritz is a Senior Product Manager for Amazon EMR Customers running Apache Spark, Presto, and the Apache Hadoop ecosystem take advantage of Amazon EMR’s elasticity to save costs by terminating clusters after workflows are complete and resizing clusters with low-cost Amazon EC2 Spot Instances. For instance, customers can create clusters for daily ETL or machine learning […]

Read More

Build a Community of Analysts with Amazon QuickSight

Imagine you’ve just landed your dream job. You’ve always liked tackling the hardest problems and you’ve got one now: You’ll work for a chain of coffee shops that’s struggling against fierce competition, tight budgets, and low morale. But there’s a new management team in place. As head of business intelligence (BI), you think you can […]

Read More

Scale Your Amazon Kinesis Stream Capacity with UpdateShardCount

Allan MacInnis is a Kinesis Solution Architect for Amazon Web Services Starting today, you can easily scale your Amazon Kinesis streams to respond in real time to changes in your streaming data needs. Customers use Amazon Kinesis to capture, store, and analyze terabytes of data per hour from clickstreams, financial transactions, social media feeds, and […]

Read More

re:Invent 2016: AWS Big Data & Machine Learning Sessions

Roy Ben-Alta is Sr. Business Development Manager at AWS – Big Data & Machine Learning Updated December 9, 2016 with links to session videos. We can’t believe that there are just a couple of weeks left before re:Invent 2016. If you are attending this year, you will want to check out our Big Data sessions! […]

Read More