Amazon Web Services Blog
Let's take a quick look at what happened in AWS-land last week:
Here are some of the events that we have on tap for the next week or two:
- Wednesday, October 22 - Webinar - Stream Processing Workflows at Scale with Amazon Kinesis.
- Tuesday, October 28 - Webinar Amazon DynamoDB Deep Dive.
- Wednesday, October 29 - Webinar Elastic MapReduce Design Patterns and Best Practices.
- Wednesday, November 5 - Webinar Ed Tech Forecast: Cloudy And a Strong Chance of Learner Success.
As you know, I'm a big fan of Amazon RDS. I love the fact that it allows you focus on your applications and not on keeping your database up and running. I'm also excited by the disruptive price, performance, and ease of use of Amazon Redshift, our petabyte-scale, fully managed data warehouse service that lets you get started for $0.25 per hour and costs less than $1,000 per TB per year. Many customers agree, as you can see from recent posts by Pinterest, Monetate, and Upworthy.
Many AWS customers want to get their operational and transactional data from RDS into Redshift in order to run analytics. Until recently, it's been a somewhat complicated process. A few week ago, the RDS team simplified the process by enabling row-based binary logging, which in turn has allowed our AWS Partner Network (APN) partners to build products that continuously replicate data from RDS MySQL to Redshift.
Two APN data integration partners, FlyData and Attunity, currently leverage row-based binary logging to continuously replicate data from RDS MySQL to Redshift. Both offer free trials of their software in conjunction with Redshift's two month free trial. After a few simple configuration steps, these products will automatically copy schemas and data from RDS MySQL to Redshift and keep them in sync. This will allow you to run high performance reports and analytics on up-to-date data in Redshift without having to design a complex data loading process or put unnecessary load on your RDS database instances.
If you're using RDS MySQL 5.6, you can replicate directly from your database instance by enabling row-based logging, as shown below. If you're using RDS MySQL 5.5, you'll need to set up a MySQL 5.6 read replica and configure the replication tools to use the replica to sync your data to Redshift. To learn more about these two solutions, see FlyData's Free Trial Guide for RDS MySQL to Redshift as well as Attunity's Free Trial and the RDS MySQL to Redshift Guide. Attunity's trial is available through the AWS Marketplace, where you can find and immediately start using software with Redshift with just a few clicks.
Informatica and SnapLogic also enable data integration between RDS and Redshift, using a SQL-based mechanism that queries your database to identify data to transfer to your Amazon Redshift clusters. Informatica is offering a 60-day free trial and SnapLogic has a 30 day free trial.
All four data integration solutions discussed above can be used with all RDS database engines (MySQL, SQL Server, PostgreSQL, and Oracle). You can also use AWS Data Pipeline (which added some recent Redshift enhancements), to move data between your RDS database instances and Redshift clusters. If you have analytics workloads, now is a great time to take advantage of these tools and begin continuously loading and analyzing data in Redshift.
Enabling Amazon RDS MySQL 5.6 Row Based Logging
Here's how you enable row based logging for MySQL 5.6:
Go to the Amazon RDS Console
and click Parameter Groups in the left pane:
Click on the Create DB Parameter Group button and create a new parameter group in the
Once in the detail view, click the Edit Parameters button. Then set the
Free Trials for Continuous RDS to Redshift Replication from APN Partners
FlyData has published a step by step guide and a video demo in order to show you how to continuously and automatically sync your RDS MySQL 5.6 data to Redshift and you can get started for free for 30 days. You will need to create a new parameter group with
NONE, and adjust a few other parameters as described in the guide above.
AWS customers are already using FlyData for continuous replication to Redshift from RDS. For example, rideshare startup Sidecar seamlessly syncs tens of millions of records per day to Redshift from two RDS instances in order to analyze how customers utilize Sidecar's custom ride services. According to Sidecar, their analytics run 3x faster and the near-real-time access to data helps them to provide a great experience for riders and drivers. Here's the data flow when using FlyData:
Attunity CloudBeam has published a configuration guide that describes how you can enable continuous, incremental change data capture from RDS MySQL 5.6 to Redshift (you can get started for free for 5 days directly from the AWS Marketplace. You will need to create a new parameter group with
For additional information on configuring Attunity for use with Redshift please see this quick start guide.
Redshift Free Trial
If you are new to Amazon Redshift, you’re eligible for a free trial and can get 750 free hours for each of two months to try a dw2.large node (16 GB of RAM, 2 virtual cores, and 160 GB of compressed SSD storage). This gives you enough hours to continuously run a single node for two months. You can also build clusters with multiple dw2.large nodes to test larger data sets; this will consume your free hours more quickly. Each month's 750 free hours are shared across all running dw2.large nodes in all regions.
To start using Redshift for free, simply go to the Redshift Console, launch a cluster, and select dw2.large for the Node Type:
Big Data Webinar
If you want to learn more, do not miss the AWS Big Data Webinar showcasing how startup Couchsurfing used Attunity’s continuous CDC to reduce their ETL process from 3 months to 3 hours and cut costs by nearly $40K.
- Go to the Amazon RDS Console and click Parameter Groups in the left pane:
Amazon WorkSpaces provides a persistent, cloud-based desktop experience that can be accessed from a variety of devices including PC and Mac desktops and laptops, iPads, Kindle Fires, and Android tablets.
Support for PCoIP Zero Clients
Today we are making WorkSpaces even more flexible by adding support for PCoIP zero clients. WorkSpaces desktops are rendered on the server and then transmitted to the endpoint as a highly compressed bitmap via the PCoIP protocol.
Zero clients are simple, secure, single-purpose clients that are equipped with a monitor, keyboard, mouse, and other peripherals. The clients use a dedicated PCoIP chipset for bitmap decompression and decoding and require very little in the way of local software maintenance (there is no operating system running on the device), making them a great match for Amazon WorkSpaces.
You can use any zero client device that contains the Teradici Tera 2 zero client chipset. Currently, over 30 hardware manufacturers provide such devices; check Teradici's supported devices list for more information.
In order to connect your existing zero clients to Amazon WorkSpaces, first verify that they are running version 4.6.0 (or newer) of the PCoIP firmware.
You will need to run the PCoIP Connection Manager authentication appliance in a Virtual Private Cloud. The Connection Manager is built on Ubuntu 12.04 LTS and is available as an HVM AMI. It brokers the authentication process and enables the creation of streaming sessions from WorkSpaces to the clients, thereby offloading all non-streaming work from the clients. The Connection Manager must be run in the VPC that hosts your Amazon WorkSpaces endpoint.
To learn more about this important new AWS feature, read the PCoIP Zero Client Admin Guide.
date: 2014-10-15 2:03:16 PM The new Quick Start Reference Deployment Guide for Cloudera Enterprise Data Hub does exactly what the title suggests! The comprehensive (20 page) guide includes the architectural considerations and configuration steps that will help you to launch the new Cloudera Director and an associated Cloudera Enterprise Data Hub (EDH) in a matter of minutes. As the folks at Cloudera said in their blog post, "Cloudera Director delivers an enterprise-class, elastic, self-service experience for Hadoop in cloud environments."
The reference deployment takes the form of a twelve-node cluster that will cost between $12 and $82 per hour in the US East (Northern Virginia) Region, depending on the instance type that you choose to deploy.
The cluster runs within a Virtual Private Cloud that includes public and private subnets, a NAT instance, security groups, a placement group for low-latency networking within the cluster, and an IAM role. The EDH cluster is fully customizable and includes worker nodes, edge nodes, and management nodes, each running on the EC2 instance type that you designate:
Version 3 of the AWS SDK for PHP is now in Developer Preview and available on GitHub and Composer. Along with significant performance improvements, v3 brings a number of brand new features. Using the new version, you can now make asynchronous requests with Futures and Promises, query API responses with JMESPath, and bind default request parameters to service clients.
The new SDK also includes some important architectural improvements. you can provide your own HTTP adapter and take advantage of a more streamlined event system offered by the latest version of the Guzzle library.
The development team is happy to be able to share this preview and is looking forward to your feedback. You can send it to them via GitHub Issues.
Amazon Elastic Transcoder lets you convert media files in to versions that will play on smart phones, tablets, and PCs. It manages all of the tedious aspects of the media transcoding process on your behalf; you simply create a transcoding Job, specify the location of the source media file, and indicate which output format(s) you need. Elastic Transcoder will take care of everything else; you don't have to worry about hardware, software, scaling, tuning, or licensing.
HLS v4 Support
Today we are adding support for version 4 of the HTTP Live Streaming (HLS) protocol. This adaptive streaming protocol is commonly used by newer iOS (5+) and Android (4.4+) devices.
Our support for version 4 of HLS includes the following features:
- Byte-Range Requests - The transcoded files contains enough information to allow the client media player to request video segments as needed. This obviates the need to create and manage thousands of smaller files. With this change, Elastic Transcoder generates one file per bitrate instead of one file per video segment per bitrate.
- Late Binding Audio - Audio and video can now be streamed separately. This allows you to reuse the same audio file with any number video files in order to eliminate redundant storage and superfluous data transfer.
- I-Frame Only Playback - This feature enables "trick play" features such as enhanced fast forward, rewind, and seeking. It works by generating a separate playlist that consists solely of what are commonly called I-frames. This is short for "intra frame," and is commonly called a key frame in other video encoding formats. To make a long story short, an I-frame does not depend on any other frames in the stream, and can be displayed in its entirety. If you are curious, Apple Technical Note 2288 (Example Playlist Files for use with HTTP Live Streaming) contains more info.
Generating HLS v4 Output
You can make use of this new feature by simply selecting the HLSv4 Playlist Format:
To allow for better file management, you can now include the "/" character in the name of your HLS v3, HLS v4, and Smooth Streaming master playlist. If you choose to do this, you need to make sure that all of the outputs in your playlist are also saved to the same folder or subfolder by using a common prefix in the Output Key fields.
This new feature is available now and you can start transcoding your content in to HLS v4 today!
Let's take a quick look at what happened in AWS-land last week:
Are you familiar with the Internet of Things? Before too long, many of the devices in your life will be connected to the Internet and to the cloud on a full or part-time basis. They will generate a constant stream of position, condition, and sensor data and send it to the AWS Cloud for storage (S3 and DynamoDB), messaging (SNS), identity (Cognito), real-time processing (Kinesis), and analysis (Redshift and Elastic MapReduce).
For example, I have a Fuse plugged in to one of my vehicles. I plugged it in to the OBD port, configured my account, and was good to go. The device sends vehicle condition information (pulled from the OBD data) and position data (from its built-in GPS) to the cloud using a built-in cellular modem. I can easily track the position and condition of my car, regardless of which member of my family happens to be driving it.
IoT HackDay at re:Invent
Our friends at Intel and Spark Labs will bring their latest microcontrollers and boards, along with a big pile of sensors, wires, LEDs, and breadboards to help you get started. Bring your laptop, your imagination and some good ideas; we'll have plenty of geeky giveaways and plenty of fuel -- snacks, coffee, and beer!
Space is limited and we can only accommodate 200 re:Invent attendees. To sign up, log in to your re:Invent account and add this event to your schedule. If you want to create a team ahead of time, register and then stand by for our pre-event survey. We'll collect enough additional information at that time to put you together with your team.
PS - Don't confuse this with the re:Invent Tatonka Challenge, which could also be thought of as the Internet of Wings.
Big changes in the technology world seem to come about in two ways. Sometimes there's a big splashy announcement and a visible public leap in to the future. Most of the time, however, change is a bit more subtle. Early adopters find a new technology that makes them more productive and share it amongst themselves. Over time the news spreads to others. At some point the once-new technology suddenly (for those who haven't been paying attention) seems to have become very popular, seemingly overnight! This technology adoption model can be seen in the recent growth in the popularity of container computing, exemplified by the rising awareness of Docker. Containers are lightweight, portable, and self-sufficient. Even better, they can be run in a wide variety of environments. You can, if you'd like, build and test a container locally and then deploy it to Amazon Elastic Compute Cloud (EC2) for production.
Benefits of Container Computing
Let's take a closer look at some of the benefits that accrue when you create your cloud-based application as a collection of containers, each specified declaratively and mapped to a single, highly specific aspect of your architecture:
- Consistency & Fidelity - There's nothing worse than creating something that works great in a test environment yet fails or runs inconsistently when moved to production. When you are building and releasing code in an agile fashion, wasting time debugging issues that arise from differences between environments is a huge barrier to productivity. The declarative, all-inclusive packaging model used by Docker gives you the power to enumerate your application's dependencies. Your application will have access to the same libraries and utilities, regardless of where it is running.
- Distributed Application Platform - If you build your application as a set of distributed services, each in a Docker container running CoreOS, they can easily find and connect to each other, perhaps with the aid of a scheduler like Mesosphere. This will allow you to deploy and then easily scale containers across a "grid" of EC2 instances.
- Development Efficiency - Building your application as a collection of tight, focused containers allows you to build them in parallel with strict, well-defined interfaces. With better interfaces between moving parts, you have the freedom to improve and even totally revise implementations without fear of breaking running code. Because your application's dependencies are spelled out explicitly and declaratively, less time will be lost diagnosing, identifying, and fixing issues that arise from missing or obsolete packages.
- Operational Efficiency - Using containers allows you to build components that run in isolated environments (limiting the ability of one container to accidentally disrupt the operation of another) while still being able to cooperatively share libraries and other common resources. This opportunistic sharing reduces memory pressure and leads to increased runtime efficiency. If you are running on EC2 (Docker is directly supported on the Amazon Linux AMI and on AWS Elastic Beanstalk, and can easily be used with AWS OpsWorks), you can achieve isolation without running each component on a separate instance. Containers are not a replacement for instances; they are destined to run on them!
Container Computing Resources
In order to prepare to write this post, I spent some time reading up on container computing and Docker. Here are the articles, blog posts, and videos that I liked the best:
- A Better Dev/Test Experience: Docker and AWS - This detailed post shows you how to use Docker to implement a complete dev and test environment on EC2.
- The Docker Book - This book by James Turnbull dives deep in to the nuts and bolts and provides a thorough and practical introduction to Docker.
- Running Docker on AWS OpsWorks - This brand-new post from my colleague Chris Barclay shows you how to create a Docker layer in your OpsWorks application.
- AWS Elastic Beanstalk for Docker - My post from earlier this year shows you how to build and test applications locally and to deploy them on Elastic Beanstalk. The Dockerizing a Python Web App post may also be helpful.
- Running Docker on AWS OpsWorks - This video from my colleague Jonathan Weiss shows you how to run Docker on AWS using AWS OpsWorks.
- AWS Elastic Beanstalk and Docker - This video from my colleague Evan Brown shows you how to use a Dockerfile to deploy your containers. He also highlights best practices for security & secret management, logging, scaling, and monitoring.
I am really excited by container computing and hope that you are as well. Please feel free to share additional resources and success stories with me and I'll update this post and our new page accordingly.
In June of this year I introduced you to the new SSD-backed Elastic Block Storage option for EC2. Just a few months after release, this new option (formally known as General Purpose (SSD)) is already being used for about 90% of the newly created EBS volumes. Our customers have told us that they love the consistent baseline performance (3 IOPS per GB of provisioned storage) and the ability to burst up to 3,000 IOPS without regard to the amount of provisioned storage.
Today we are bringing the same consistent baseline performance and bursting capability to Amazon Relational Database Service (RDS). As an RDS user, you now have your choice of three different types of storage:
- General Purpose (SSD) storage is suitable for a wide variety of database workloads that have moderate I/O requirements. The baseline of 3 IOPS per GB and the ability to burst up to 3,000 IOPS will provide you with predictable performance well-suited to many applications.
- Provisioned IOPS (SSD) storage is ideal for the most demanding database workloads, including OLTP. This storage provides the most consistent performance and allows you to provision between 1,000 and 30,000 IOPS as required by your application.
- Magnetic Storage (formerly known as RDS Standard storage) is a good match for small database workloads where data is accessed infrequently.
You can use the new General Purpose (SSD) storage in conjunction with other powerful RDS features such as Multi-AZ, Read Replicas, and Amazon Virtual Private Cloud. Pricing for this new option starts at $0.115 per GB per month in the US East (Northern Virginia) Region. As usual, full pricing information can be found on the RDS Pricing page.
Get Started Today
This new storage option is available in all AWS Regions and can be used with the MySQL, Oracle, PostgreSQL, and SQL Server database engines.