AWS Blog

GE Oil & Gas – Digital Transformation in the Cloud

by Jeff Barr | on | in Enterprise, Guest Post | | Comments

GE Oil & Gas is a relatively young division of General Electric, the product of a series of acquisitions made by parent company General Electric starting in the late 1980s. Today GE Oil &Gas is pioneering the digital transformation of the company. In the guest post below, Ben Cabanas, the CTO of GE Transportation and formerly the cloud architect for GE Oil & Gas, talks about some of the key steps involved in a major enterprise cloud migration, the theme of his recent presentation at the 2016 AWS Summit in Sydney, Australia.

You may also want to learn more about Enterprise Cloud Computing with AWS.


Challenges and Transformation
GE Oil & Gas is at the forefront of GE’s digital transformation, a key strategy for the company going forward. The division is also operating at a time when the industry is facing enormous competitive and cost challenges, so embracing technological innovation is essential. As GE CIO Jim Fowler has noted, today’s industrial companies have to become digital innovators to thrive.

Moving to the cloud is a central part of this transformation for GE. Of course, that’s easier said than done for a large enterprise division of our size, global reach, and role in the industry. GE Oil & Gas has more than 45,000 employees working across 11 different regions and seven research centers. About 85 percent of the world’s offshore oil rigs use our drilling systems, and we spend $5 billion annually on energy-related research and development—work that benefits the entire industry. To support all of that work, GE Oil & Gas has about 900 applications, part of a far larger portfolio of about 9,000 apps used across GE. A lot of those apps may have 100 users or fewer, but are still vital to the business, so it’s a huge undertaking to move them to the cloud.

Our cloud journey started in late 2013 with a couple of goals. We wanted to improve productivity in our shop floors and manufacturing operations. We sought to build applications and solutions that could reduce downtime and improve operations. Most importantly, we wanted to cut costs while improving the speed and agility of our IT processes and infrastructure.

Iterative Steps
Working with AWS Professional Services and Sogeti, we launched the cloud initiative in 2013 with a highly iterative approach. In the beginning, we didn’t know what we didn’t know, and had to learn agile as well as how to move apps to the cloud. We took steps that, in retrospect, were crucial in supporting later success and accelerated cloud adoption. For example, we sent more than 50 employees to Seattle for training and immersion in AWS technologies so we could keep critical technical IP in-house. We built foundational services on AWS, such as monitoring, backup, DNS, and SSO automation that, after a year or so, fostered the operational maturity to speed the cloud journey. In the process, we discovered that by using AWS, we can build things at a much faster pace than what we could ever accomplish doing it internally.

Moving to AWS has delivered both cost and operational benefits to GE Oil & Gas.

We architected for resilience, and strove to automate as much as possible to reduce touch times. Because automation was an overriding consideration, we created a “bot army” that is aligned with loosely coupled microservices to support continuous development without sacrificing corporate governance and security practices. We built in security at every layer with smart designs that could insulate and protect GE in the cloud, and set out to measure as much as we could—TCO, benchmarks, KPIs, and business outcomes. We also tagged everything for greater accountability and to understand the architecture and business value of the applications in the portfolio.

Moving Forward
All of these efforts are now starting to pay off. To date, we’ve realized a 52 percent reduction in TCO. That stems from a number of factors, including the bot-enabled automation, a push for self-service, dynamic storage allocation, using lower-cost VMs when possible, shutting off compute instances when they’re not needed, and moving from Oracle to Amazon Aurora. Ultimately, these savings are a byproduct of doing the right thing.

The other big return we’ve seen so far is an increase in productivity. With more resilient, cloud-enabled applications and a focus on self-service capability, we’re getting close to a “NoOps” environment, one where we can move away from “DevOps” and “ArchOps,” and all the other “ops,” using automation and orchestration to scale effectively without needing an army of people. We’ve also seen a 50 percent reduction in “tickets” and a 98 percent reduction in impactful business outages and incidents—an unexpected benefit that is as valuable as the cost savings.

For large organizations, the cloud journey is an extended process. But we’re seeing clear benefits and, from the emerging metrics, can draw a few conclusions. NoOps is our future, and automation is essential for speed and agility—although robust monitoring and automation require investments of skill, time, and money. People with the right skills sets and passion are a must, and it’s important to have plenty of good talent in-house. It’s essential to partner with business leaders and application owners in the organization to minimize friction and resistance to what is a major business transition. And we’ve found AWS to be a valuable service provider. AWS has helped move a business that was grounded in legacy IT to an organization that is far more agile and cost efficient in a transformation that is adding value to our business and to our people.

— Ben Cabanas, Chief Technology Officer, GE Transportation


Register Now – AWS DevDay in San Francisco

by Jeff Barr | on | in Developers, Events | | Comments

I am a firm believer in the value of continuing education. These days, the half-life on knowledge of any particular technical topic seems to be less than a year. Put another way, once you stop learning your knowledge base will be just about obsolete within 2 or 3 years!

In order to make sure that you stay on top of your field, you need to decide to learn something new every week. Continuous learning will leave you in a great position to capitalize on the latest and greatest languages, tools, and technologies. By committing to a career marked by lifelong learning, you can be sure that your skills will remain relevant in the face of all of this change.

Keeping all of this in mind, I am happy to be able to announce that we will be holding an AWS DevDay in San Francisco on June 21st.The day will be packed with technical sessions, live demos, and hands-on workshops, all focused on some of today’s hottest and most relevant topics. If you attend the AWS DevDay, you will also have the opportunity to meet and speak with AWS engineers and to network with the AWS technical community.

Here are the tracks:

  • Serverless – Build and run applications without having to provision, manage, or scale infrastructure. We will demonstrate how you can build a range of applications from data processing systems to mobile backends to web applications.
  • Containers – Package your application’s code, configurations, and dependencies into easy-to-use building blocks. Learn how to run Docker-enabled applications on AWS.
  • IoT – Get the most out of connecting IoT devices to the cloud with AWS. We will highlight best practices using the cloud for IoT applications, connecting devices with AWS IoT, and using AWS endpoints.
  • Mobile – When developing mobile apps, you want to focus on the activities that make your app great and not the heavy lifting required to build, manage, and scale the backend infrastructure. We will demonstrate how AWS helps you easily develop and test your mobile apps and scale to millions of users.

We will also be running a series of hands-on workshops that day:

  • Zombie Apocalypse Workshop: Building Serverless Microservices.
  • Develop a Snapchat Clone on AWS.
  • Connecting to AWS IoT.

Registration and Location
There’s no charge for this event, but space is limited and you need to register quickly in order to attend.

All sessions will take place at the AMC Metreon at 135 4th Street in San Francisco.





Hot Startups on AWS – April 2016 – Robinhood, Dubsmash, Sharethrough

by Jeff Barr | on | in Startups | | Comments

Continuing with our focus on hot AWS-powered startups (see Hot Startups on AWS – March 2016 for more info), this month I would like to tell you about:

  • Robinhood – Free stock trading to democratize access to financial markets.
  • Dubsmash – Bringing joy to communication through video.
  • Sharethrough – An all-in-one native advertising platform.

The founders of Robinhood graduated from Stanford and then moved to New York to build trading platforms for some of the largest financial institutions in the world. After seeing that these institutions charged investors up to $10 to place trades that cost almost nothing, they moved back to California with the goal of democratizing access to the markets and empowering personal investors.

Starting with the idea that a technology-driven brokerage could operate with significantly less overhead than a traditional firm, they built a self-serve service that allows customers to sign up in less than 4 minutes. To date, their customers have transacted over 3 billion dollars while saving over $100 million dollars in commissions.

After a lot of positive pre-launch publicity, Robinhood debuted with a waiting list of nearly a million people. Needless to say, they had to pay attention to scale from the very beginning. Using 18 distinct AWS services, a beginning team of just two DevOps people built the entire system. They use AWS Identity and Access Management (IAM) to regulate access to services and to data, simplifying their all-important compliance efforts. The Robinhood data science team uses Amazon Redshift to help identify possible instances of fraud and money laundering. Next on the list is international expansion, with plans to make use of multiple AWS Regions.

The founders of Dubsmash had previously worked together to create several video-powered applications. As the cameras in smartphones continued to improve, they saw an opportunity to create a platform that would empower people to express themselves visually. Starting simple, they built their first prototype in a couple of hours. The functionality was minimal: play a sound, select a sound, record a video, and share. The initial response was positive and they set out to build the actual product.

The resulting product, Dubsmash, allows users to combine video with popular sound bites and to share the videos online – with a focus on modern messaging apps. The founders began working on the app in the summer of 2014 and launched the first version the following November. Within a week it reached the top spot in the German App Store. As often happens, early Dubsmash users have put the app to use in intriguing and unanticipated ways. For example, Eric Bruce uses Dubsmash to create entertaining videos of him and his young son Jack to share with Priscilla (Eric’s wife / Jack’s mother) (read Watch A Father and His Baby Son Adorably Master Dubsmash to learn more).

Dubsmash uses Amazon Simple Storage Service (S3) for video storage, with content served up through Amazon CloudFront.  They have successfully scaled up from their MVP and now handle requests from millions of users. To learn more about their journey, read their blog post, How to Serve Millions of Mobile Clients with a Single Core Server.

Way back in 2008, a pair of Stanford graduate students were studying the concept of virality and wanted to create ads that would deserve your attention rather than simply stealing it. They created Sharethrough, an all-in-one native advertising platform for publishers, app developers, and advertisers. Today the company employs more than 170 people and serves over 3 billion native ad impressions per month.

Sharethrough includes a mobile-first content-driven platform designed to engage users with quality content that is integrated into the sites where it resides. This allows publishers to run premium ads and to maintain a high-quality user experience. They recently launched an AI-powered guide that helps to maximize the effectiveness of ad headlines.

Sharethrough’s infrastructure is hosted on AWS, where they make use of over a dozen high-bandwidth services including Kinesis and Dynamo, for the scale of the technical challenges they face. Relying on AWS allows them to focus on their infrastructure-as-code approach, utilizing tools like Packer and Terraform for provisioning, configuration and deployment. Read their blog post (Ops-ing with Packer and Terraform) to learn more.




They’re Here – Longer EBS and Storage Gateway Resource IDs Now Available

by Jeff Barr | on | in Amazon EC2, Amazon Elastic Block Store, AWS Storage Gateway | | Comments

Last November I let you know that were were planning to increase the length of the resource IDs for EC2 instances, reservations, EBS volumes, and snapshots in 2016. Early this year I showed you how to opt in to the new format for EC2 instances and EC2 reservations.

Effective today you can now opt in to the new format for volumes and snapshots for EBS and Storage Gateway.

As I said earlier:

If you build libraries, tools, or applications that make direct calls to the AWS API, now is the time to opt in and to start your testing process! If you store the IDs in memory or in a database, take a close look at fixed-length fields, data structures, schema elements, string operations, and regular expressions. Resources that were created before you opt in will retain their existing short identifiers; be sure that your revised code can still handle them!

You can opt in to the new format using the AWS Management Console, the AWS Command Line Interface (CLI), the AWS Tools for Windows PowerShell, or by calling the ModifyIdFormat API function.

Opting In – Console
To opt in via the Console, simply log in, choose EC2, and click on Resource ID length management:

Then click on Use Longer IDs for the desired resource types:

Note that volume applies to EBS volumes and to Storage Gateway volumes and that snapshot applies to EBS snapshots (both direct and through Storage Gateway).

For information on using the AWS Command Line Interface (CLI) or the AWS Tools for Windows PowerShell, take a look at They’re Here – Longer EC2 Resource IDs Now Available.

Things to Know
Here are a couple of things to keep in mind as you transition to the new resource IDs:

  1. Some of the older versions of the AWS SDKs and CLIs are not compatible with the new format. Visit the Longer EC2 and EBS Resource IDs FAQ for more information on compatibility.
  2. New AWS Regions get longer instance, reservation, volume, and snapshot IDs by default. You can opt out for Regions that launch between now and December 2016.
  3. Starting on April 28, 2016, new accounts in all commercial regions except Beijing (China) and AWS GovCloud (US) will get longer instance and reservation IDs by default, again with the ability to opt out.



Autheos – At the Nexus of Marketing and E-Commerce

by Jeff Barr | on | in Customer Success, Guest Post | | Comments

In today’s guest post, Leon Mergen, CTO of Autheos, reviews their company history and their move to AWS.


Adding video to a product page on an e-commerce site is perhaps the single most effective way to drive increased sales — studies have shown sales conversion rates can go up by more than two thirds. In addition, product video viewing data fills a gaping hole in a brand’s / supplier’s ability to assess the effectiveness of their online and offline marketing efforts at driving e-commerce sales. We had built an OK product video distribution platform… but we knew we couldn’t scale globally with the technology we were using. So, in September last year, we decided to transition to AWS, and, while doing so built an e-commerce marketing support tool for Brands which, judging by customer response, is a game changer. This is our story.

The Perils of Good Fortune
Autheos was founded in 2012 when the biggest Webshop in Holland and Belgium asked us to turn an existing piece of technology into a video hosting solution that would automatically find and insert product videos into their product sales pages.  A startup rarely finds itself in a better position to start, so we jumped right in and started coding.  Which was, in retrospect, a mistake for two reasons.

For one thing, we grew too fast.  When you have a great client that really wants your product, the natural reaction is to build it as fast as you can.  So, since there wasn’t a team in place, we (too) quickly on-boarded engineers and outsourced several components to remote development shops, which resulted in classic issues of communication problems and technical incompatibilities.

More importantly, however, since we already had an existing piece of technology, we didn’t take the time to think how we would build it if we were starting from scratch.  It seemed like it would be quicker to adapt it to the new requirements.  And kind of like a home-owner who opts for renovation instead of tear-down and rebuild, we had to make all sorts of compromises as a result.

However, thanks to many all-nighters we managed to meet the deadline and launch a platform that allowed brands such as Philips, LEGO, L’Oreal, and Bethesda to upload product videos (commercials, guides, reviews, and so forth) for free and tag them with a product code and language.

The webshops integrated a small piece of javascript code that enabled them to query our video database in real-time with a product code and language, display a custom button if a video was found, and pop up the right videos(s) for the product, in the desired language.

Click here to see an example video on (the biggest webshop in Benelux); our video is behind the button.

The results: less work for the webshop (no more manual gathering of videos, decoding/encoding, hosting and matching them with the right products) and more sales. Our client convinced its Brands to start uploading their videos, and kickstarted our exponential growth. Soon we had so many Brands using our platform, and so many videos in our database, that nearly all major webshops in Benelux wanted to work with us as well (often pushed to do so by Brands, who didn’t want the hassle of interfacing / integrating with many different webshops).

This might sound great, but remember how we built the product in a rush with legacy code?  After three years of fire-fighting, interspersed with frequent moments of disbelief when we found out that certain features we wanted to offer were impossible due to limitations in our backend, we decided enough was enough… it was time to start over.

A New Beginning with AWS
Our key requirements were that we needed to seamlessly scale globally, log and process all of our data, and provide high performance access to our ever growing database of product videos. Besides this, we needed to make sure we could ship new features and products quickly without impacting wider operations. Oh, and we wanted to be up and running with the new platform in 6 months. As the de-facto standard for web applications, the choice of AWS was an easy one. However, we soon realized that it wasn’t just an easy decision, it was a really smart one too.

Elastic Transcoder was the main reason for us to decide to go with AWS. Before working with ET, we used a custom transcoding service that had been built by an outsourced company in Eastern Europe. As a result of hosting the service there on antiquated servers, the transcoding service suffered from lots of downtime, and caused many headaches. Elastic Transcoder allows us to forget about all these problems, and gives us stable transcoding service which we can scale on-demand.

When we moved our application servers to AWS, we also activated Amazon CloudFront. This was a no-brainer for us even though there are many other CDNs available, as CloudFront integrates unbelievably well within AWS. Essentially it just worked. With a few clicks we were able to build a transcoding pipeline that directly uploads its result to CloudFront. We make a single API call, and AWS takes care of the rest, including CDN hosting. It’s really that easy.

As we generate a huge number of log records every day, we had to make sure these were stored in a flexible and scalable environment. A regular PostgreSQL server would have worked, however, this would never have been cost-efficient at our scale. So we started running some prototypes with Amazon Redshift, the PostgreSQL compatible data warehousing solution by AWS. We set up Kinesis Firehose to stream data from our application servers to Amazon Redshift, writing it off in batches (in essence creating a full ETL process as a service), something that would have taken a major effort with a traditional webhost. Doing this outside of AWS would have taken months; with AWS we managed to set all of this up in three days.

Managing this data through data mining frameworks was the next big challenge, for which many solutions exist in market. However, Amazon has great solutions in an integrated platform that enabled us to test and implement rapidly. For batch processing we use Spark, provided by Amazon EMR. For temporary hooking into data streams – e.g. our monitoring systems – we use AWS Data Pipeline, which gives us access to the stream of data as it is generated by our application servers, comparable to what Apache Kafka would give you.

Everything we use is accessible through an SDK, which allows us to run integration tests effectively in an isolated environment. Instead of having to mock services, or setting up temporary services locally and in our CI environment, we use the AWS SDK to easily create and clean up AWS services. The flexibility and operational effectiveness this brings is incredible, as our whole production environment can be replicated in a programmable setup, in which we can simulate specific experiments. Furthermore, we catch many more problems by actually integrating all services in all automated tests, something you would otherwise only catch during manual testing / staging.

Through AWS CloudFormation and AWS CodeDeploy we seamlessly built our cloud using templates, and integrated this with our testing systems in order to support our Continuous Deployment setup. We could, of course, have used Chef or Puppet with traditional webhosts, but the key benefit in using the AWS services for this is that we have instant access to a comprehensive ecosystem of tools and features with which we can integrate (and de-integrate) as we go.

Unexpected Bounty
One month in, things were going so smoothly that we did something that we had never done before in the history of the company:  we expanded our goals during a project without pushing out the delivery date.  We always knew that we had data that could be really valuable for Brands, but since our previous infrastructure made it really difficult to access or work with this data, we had basically ignored it.  However, when we had just finished our migration to Redshift, one of our developers read an article about the powerful combination of Redshift and Periscope.  So we decided to prototype an e-commerce data analysis tool.

A smooth connection with our Redshift tables was made almost instantly, and we saw our 500+ million records visualized in a few graphs that the Periscope team prepared for us.  Jaws dropped and our product manager went ahead and built an MVP. A few weeks of SQL courses, IRC spamming and nagging the Periscope support team later, and we had an alpha product.

We have shown this to a dozen major Brands and the response has been all we could hope for… a classic case of the fabled product / market fit. And it would not have happened without AWS.

An example of the dashboard for one of our Founding Partners (a global game development company).

With a state of the art platform, promising new products, and the backend infrastructure to support global viral growth we finally had a company that could attract the attention of professional investors… and within a few weeks of making our new pitch we had closed our first outside investment round.

We’ve come a long way from working with a bare bones transcoding server, to building a scalable infrastructure and best-in-class products that are ready to take over the world!

Our very first transcoding server.

What’s Next?
Driving viral spread globally to increase network effects, we are signing up new Webshops and Brands at a tremendous pace.  We are putting the finishing touches on the first version of our ecommerce data analysis product for Brand marketers, and speccing out additional products and features for Brands and Webshops working with the Autheos Network.  And of course we are looking for amazing team members to help make this happen. If you would like to join us on the next stage of our journey, please look at our website for current openings — and yes, we are looking for DevOps engineers!

And lastly, since this is the Amazon Web Services blog, we can’t resist being cheeky and thus herewith take the opportunity to invite Mr. Bezos to sit down with us to see if we can become the global product video partner for Amazon.  One thing’s for sure: our infrastructure is the best!

— Leon Mergen, CTO –

Machine Learning, Recommendation Systems, and Data Analysis at Cloud Academy

by Jeff Barr | on | in Amazon RDS, Amazon S3, AWS Lambda, Guest Post | | Comments

In today’s guest post, Alex Casalboni and Giacomo Marinangeli of Cloud Academy discuss the design and development of their new Inspire system.


Our Challenge
Mixing technology and content has been our mission at Cloud Academy since the very early days. We are builders and we love technology, but we also know content is king. Serving our members with the best content and creating smart technology to automate it is what kept us up at night for a long time.

Companies are always fighting for people’s time and attention and at Cloud Academy, we face those same challenges as well. Our goal is to empower people, help them learn new Cloud skills every month, but we kept asking ourselves: “How much content is enough? How can we understand our customer’s goals and help them select the best learning paths?”

With this vision in mind about six months ago we created a project called Inspire which focuses on machine learning, recommendation systems and data analysis. Inspire solves our problem on two fronts. First, we see an incredible opportunity in improving the way we serve our content to our customers. It will allow us to provide better suggestions and create dedicated learning paths based on an individual’s skills, objectives and industries. Second, Inspire represented an incredible opportunity to improve our operations. We manage content that requires constant updates across multiple platforms with a continuously growing library of new technologies.

For instance, getting a notification to train on a new EC2 scenario that you’re using in your project can really make a difference in the way you learn new skills. By collecting data across our entire product, such as when you watch a video or when you’re completing an AWS quiz, we can gather that information to feed Inspire. Day by day, it keeps personalising your experience through different channels inside our product. The end result is a unique learning experience that will follow you throughout your entire journey and enable a customized continuous training approach based on your skills, job and goals.

Inspire: Powered by AWS
Inspire is heavily based on machine learning and AI technologies, enabled by our internal team of data scientists and engineers. Technically, this involves several machine learning models, which are trained on the huge amount of collected data. Once the Inspire models are fully trained, they need to be deployed in order to serve new predictions, at scale.

Here the challenge has been designing, deploying and managing a multi-model architecture, capable of storing our datasets, automatically training, updating and A/B testing our machine learning models, and ultimately offering a user-friendly and uniform interface to our website and mobile apps (available for iPhone and Android).

From the very beginning, we decided to focus high availability and scalability. With this in mind, we designed an (almost) serverless architecture based on AWS Lambda. Every machine learning model we build is trained offline and then deployed as an independent Lambda function.

Given the current maximum execution time of 5 minutes, we still run the training phase on a separate EC2 Spot instance, which reads the dataset from our data warehouse (hosted on Amazon RDS), but we are looking forward to migrating this step to a Lambda function as well.

We are using Amazon API Gateway to manage RESTful resources and API credentials, by mapping each resource to a specific Lambda function.

The overall architecture is logically represented in the diagram below:

Both our website and mobile app can invoke Inspire with simple HTTPS calls through API Gateway. Each Lambda function logically represents a single model and aims at solving a specific problem. More in detail, each Lambda function loads its configuration by downloading the corresponding machine learning model from Amazon S3 (i.e. a serialized representation of it).

Behind the scenes, and without any impact on scalability or availability, an EC2 instance takes care of periodically updating these S3 objects, as outcome of the offline training phase.

Moreover, we want to A/B test and optimize our machine learning models: this is transparently handled in the Lambda function itself by means of SixPack, an open-source A/B testing framework which uses Redis.

Data Collection Pipeline
As far as data collection is concerned, we use as data hub: with a single API call, it allows us to log events into multiple external integrations, such as Google Analytics, Mixpanel, etc. We also developed our own custom integration (via webhook) in order to persistently store the same data in our AWS-powered data warehouse, based on Amazon RDS.

Every event we send to is forwarded to a Lambda function – passing through API Gateway – which takes care of storing real-time data into an SQS queue. We use this queue as a temporary buffer in order to avoid scalability and persistency problems, even during downtime or scheduled maintenance. The Lambda function also handles the authenticity of the received data thanks to a signature, uniquely provided by

Once raw data has been written onto the SQS queue, an elastic fleet of EC2 instances reads each individual event – hence removing it from the queue without conflicts – and writes it into our RDS data warehouse, after performing the required data transformations.

The serverless architecture we have chosen drastically reduces the costs and problems of our internal operations, besides providing high availability and scalability by default.

Our Lambda functions have a pretty constant average response time – even during load peaks – and the SQS temporary buffer makes sure we have a fairly unlimited time and storage tolerance before any data gets lost.

At the same time, our machine learning models won’t need to scale up in a vertical or distributed fashion since Lambda takes care of horizontal scaling. Currently, they have an incredibly low average response time of 1ms (or less):

We consider Inspire an enabler for everything we do from a product and content perspective, both for our customers and our operations. We’ve worked to make this the core of our technology, so that its contributions can quickly be adapted and integrated by everyone internally. In the near future, it will be able to independently make decisions for our content team while focusing on our customers’ need.  At the end of the day, Inspire really answers our team’s doubts on which content we should prioritize, what works better and exactly how much of it we need. Our ultimate goal is to improve our customer’s learning experience by making Cloud Academy smarter by building real intelligence.

Join our Webinar
If you would like to learn more about Inspire, please join our April 27th webinar – How we Use AWS for Machine Learning and Data Collection.

Alex Casalboni, Senior Software Engineer, Cloud Academy
Giacomo Marinangeli, CTO, Cloud Academy

PS – Cloud Academy is hiring – check out our open positions!

AWS Week in Review – April 18, 2016

by Jeff Barr | on | in Week in Review | | Comments

Let’s take a quick look at what happened in AWS-land last week:


April 18


April 19


April 20


April 21


April 22


April 23


April 24

New & Notable Open Source

New SlideShare Presentations

New Customer Success Stories

New YouTube Videos

Upcoming Events

Help Wanted

Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed.


AWS Webinars – April 2016

by Jeff Barr | on | in Webinars | | Comments

Our 2016 series of webinars continues with a strong set of 200-level topics in April. The webinars are free, but space is limited and you should sign up ahead of time if you would like to attend. Here’s what we have on the calendar for the last week of April (all times are Pacific):

Tuesday, April 26
Are you ready to launch and connect to your first EC2 instance? Do you want to learn how to use Amazon Simple Storage Service (S3) to store and share files? Our getting started webinar will show you how to do both.

Webinar: Getting Started with AWS (9 – 10 AM).

Do you want to learn how to use Apache Spark to analyze real-time streams of data on an Amazon EMR cluster? Do you want to know how to use Spark as part of a system that includes Amazon DynamoDB, Amazon Redshift, Amazon Kinesis, and other big data tools? This webinar will show you how to use Spark to address common big data use cases.

Webinar: Best Practices for Apache Spark on AWS (10:30 – 11:30 AM).

Are you interested in running a commercial relational database in the cloud? Do you want to know more about best practices for running single and multiple database instances, or do you have questions about costs and licensing? Attend this webinar to learn more about Amazon RDS running Oracle.

Webinar: RDS for Oracle: Quick Provision, Easy to Manage, Reduced Cost (Noon – 1 PM).

Wednesday, April 27
In today’s real-time world, going from raw data to insights as quickly as possible has become a must. Fortunately, a number of AWS tools can help you to capture, store, and analyze real-time streaming data. Attend this webinar to learn about Amazon Kinesis Streams, Lambda, and Spark Streaming on Amazon EMR.

Webinar: Getting Started with Real-Time Data Analytics on AWS (9 – 10 AM).

As you move your business and your applications to the cloud, you should also look at modernizing your development and deployment practices. For example, many AWS customers use tools like AWS CodePipeline and AWS CodeDeploy to implement continuous delivery. Attend this webinar to learn more about what this means and how to put it in to practice in your own organization.

Webinar: Getting Started with Continuous Delivery on AWS (10:30 – 11:30 AM).

Now that Amazon S3 is a decade old, we have a wealth of experience to share about the best ways to use it for backup, compliance, archiving, and many other purposes. This webinar will share best practices for keeping your data safe, and will also provide an overview of several different transfer services.

Webinar: S3 Best Practices: A Decade of Field Experience (Noon – 1 PM).

Thursday, April 28
AWS Lambda brings some new flexibility to the development and deployment process. When used in conjunction with AWS Storage Gateway, it can be used as the basis for an automated development workflow that easily supports distinct development, staging, and production environments. Attend this webinar to learn more.

Webinar: Continuous Delivery to AWS Lambda (9 AM – 10 AM).

Amazon Aurora is a MySQL-compatible database engine that can boost performance, reliability, and availability while reducing the total cost of ownership. Join this webinar to learn more about Aurora and to better understand how to migrate your existing on-premises or cloud-based databases to it.

Webinar: Migrating Your Databases to Amazon Aurora (10:30 – 11:30 AM).

Containers and microservices are both a natural fit for the cloud. Attend this webinar to learn more about the challenges that might arise and the best practices to address them.

Webinar: Running Microservices on Amazon ECS (Noon – 1 PM).



Amazon EMR Update – Apache HBase 1.2 Is Now Available

by Jeff Barr | on | in Amazon EMR | | Comments

Apache HBase is a distributed, scalable big data store designed to support tables with billions of rows and millions of columns. HBase runs on top of Hadoop and HDFS and can also be queried using MapReduce, Hive, and Pig jobs.

AWS customers use HBase for their ad tech, web analytics, and financial services workloads. They appreciate its scalability and the ease with which it handles time-series data.

HBase 1.2 on Amazon EMR
Today we are making version 1.2 of HBase available for use with Amazon EMR.  Here are some of the most important and powerful features and benefits that you get when you run HBase:

Strongly Consistent Reads and Writes – When a writer returns, all of the readers will see the same value.

Scalability – Individual HBase tables can be comprised of billions of rows and millions of columns. HBase stores data in a sparse form in order to conserve space. You can use column families and column prefixes to organize your schemas and to indicate to HBase that the members of the family have a similar access pattern. You can also use timestamps and versioning to retain old versions of cells.

Backup to S3 – You can use the HBase Export Snapshot tool to backup your tables to Amazon S3. The backup operation is actually a MapReduce job and uses parallel processing to adeptly handle large tables.

Graphs And Timeseries – You can use HBase as the foundation for a more specialized data store. For example, you can use Titan for graph databases and OpenTSDB for time series.

Coprocessors – You can write custom business logic (similar to a trigger or a stored procedure) that runs within HBase and participates in query and update processing (read The How To of HBase Coprocessors to learn more).

You also get easy provisioning and scaling, access to a pre-configured installation of HDFS, and automatic node replacement for increased durability.

Getting Started with HBase
HBase 1.2 is available as part of Amazon EMR release 4.6. You can, as usual, launch it from the Amazon EMR Console, the Amazon EMR CLI, or through the Amazon EMR API. Here’s the command that I used:

$ aws --region us-east-1 emr create-cluster \
  --name "MyCluster" --release-label "emr-4.6.0" \
  --instance-type m3.xlarge --instance-count 3 --use-default-roles \
  --ec2-attributes KeyName=keys-jbarr-us-east \
  --applications Name=Hadoop Name=Hue Name=HBase Name=Hive

This command assumes that the EMR_DefaultRole and EMR_EC2_DefaultRole IAM roles already exist. They are created automatically when you launch an EMR cluster from the Console (read about Create and Use Roles for Amazon EMR and Create and Use Roles with the AWS CLI to learn more).

I  found the master node’s DNS on the Cluster Details page and SSH’ed in as user hadoop. Then I ran a couple of HBase shell commands:

Following the directions in our new HBase Tutorial, I created a table called customer, restored a multi-million record snapshot from S3 into the table, and ran some simple queries:

Available Now
You can start using HBase 1.2 on Amazon EMR today. To learn more, read the Amazon EMR Documentation.


AWS Week in Review – April 11, 2016

by Jeff Barr | on | in Week in Review | | Comments

Let’s take a quick look at what happened in AWS-land last week:


April 11


April 12


April 13


April 14


April 15


April 16


April 17

New & Notable Open Source

  • cfn-include implements a Fn::Include for CloudFormation templates.
  • TumblessTemplates is a set of CloudFormation templates for quick setup of the Tumbless blogging platform.
  • s3git is Git for cloud storage.
  • s3_uploader is an S3 file uploader GUI written in Python.
  • SSH2EC2 lets you connect to EC2 instances via tags and metadata.
  • lambada is AWS Lambda for silly people.
  • aws-iam-proxy is a proxy that signs requests with IAM credentials.
  • hyperion is a Scala library and a set of abstractions for AWS Data Pipeline.
  • dynq is a DynamoDB query library.
  • cloud-custodian is a policy rules engine for AWS management.

New SlideShare Presentations

New Customer Success Stories

New YouTube Videos

Upcoming Events

Help Wanted

Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed.