AWS Open Source Blog

re:Invent open source highlights: Week 2

Over the past three weeks, re:Invent 2020 has had hundreds of sessions across different topics and tracks. This is the second post of the re:Invent highlight series, covering week two open source highlights across various tracks and sessions. If you missed it, make sure you check out the first week’s highlights and week three.

re:Invent is back on January 11th, 2021 so, in the final post, we’ll let you know more about what you can expect.

Swami Sivasubramanian on stage for his keynote.

Announcements and launches

Week two featured several announcements, including exciting artificial intelligence (AI) and machine learning (ML) launches with great open source technologies and projects for you to experiment with.

Amazon Neptune ML

Amazon Neptune ML is a new capability that uses graph neural networks (GNNs), a machine learning technique purpose-built for graphs. Neptune ML uses the Deep Graph Library (DGL), an open source library to which AWS contributes that makes it easy to develop and apply GNN models on graph data. Announcing Amazon Neptune ML: Easy, fast, and accurate predictions on graphs, a blog post from George Karypis, Dave Bechberger, and Karthik Bharathy, announced this new capability, which provides an easy, fast, and accurate approach to predictions on graphs. As a result, you can now create, train, and apply ML on Neptune data in hours instead of weeks without the need to learn new tools and ML technologies. Now, any developer with data in Neptune can easily use ML on their graphs.

In How to get started with Neptune ML, the same authors show you how you can easily set up Neptune ML and infer properties of vertices within a graph. You will get a hands-on view of using a new open source project from the Amazon Neptune team, graph-notebooks.

Amazon SageMaker Edge Manager

Amazon SageMaker Edge Manager Simplifies Operating Machine Learning Models on Edge Devices is a post from Julien Simon, where he takes a look at this new service that makes it easier to optimize, secure, monitor, and maintain machine learning models on a fleet of edge devices. Starting from a model that you trained or imported in Amazon SageMaker, Amazon SageMaker Edge Manager first optimizes the model for your hardware platform using Amazon SageMaker Neo. SageMaker Edge Manager subsequently packages the model and stores it in Amazon Simple Storage Service (S3), where it can be deployed to your devices. In fact, you can deploy multiple models, loading and predicting with a runtime optimized for your hardware of choice. The post provides a detailed rundown on how this works and includes a customer story from Lenovo.

Amazon SageMaker Clarify

New – Amazon SageMaker Clarify Detects Bias and Increases the Transparency of Machine Learning Models is a post from the AWS News Blog in which Julien Simon introduces this new capability within Amazon SageMaker that helps customers detect bias in machine learning models and increase transparency by helping explain model behavior to stakeholders and customers. As part of this initiative, we have open sourced a key part of this capability, so you can use the same approach when you are doing local development. Amazon-sagemaker-clarify is the GitHub repository for this open source project. Simon has also put together a short video that demonstrates how to compute bias metrics on your datasets, using the amazon-sagemaker-clarify open source package in Python, running a simple example on his local machine and within Amazon SageMaker. Additionally, How Clarify helps machine learning developers detect unintended bias, a post on the Amazon Science Blog, provides further information on how Amazon SageMaker Clarify works. Well worth a read to get a good grounding in this subject.

Amazon EMR on Amazon EKS

In New – Amazon EMR on Amazon Elastic Kubernetes Service (EKS), Channy Yun announces the general availability of Amazon EMR on Amazon EKS, a new deployment option in EMR that allows customers to automate the provisioning and management of open source big data frameworks on Amazon EKS. With EMR on EKS, customers can run Spark applications alongside other types of applications on the same EKS cluster to improve resource utilization and simplify infrastructure management.


For a recap of last week’s sessions, you can check the session lineup via the re:Invent on-demand capability. Here is a handy list of some of the highlights from week two.

The serverless LAMP stack follows a series of articles from Ben Smith, in which he uses LAMP (Linux, Apache, MySQL, and PHP) and helps you through a number of modernization approaches on AWS. One of those modernization techniques involves the adoption of serverless. In this session, you will learn about the serverless LAMP stack and how to use your favorite open source frameworks like Laravel to build modern, serverless PHP apps.

Building real-time applications using Apache Flink is a session presented by Steffen Hausman where he shares how to build real-time applications using Apache Flink with Apache Kafka and Amazon Kinesis Data Streams. Apache Flink is a framework and engine for building streaming applications for use cases such as real-time analytics and complex event processing. Hausman covers best practices for building low-latency applications with Apache Flink when reading data from either Amazon Managed Streaming for Apache Kafka (Amazon MSK) or Amazon Kinesis Data Streams and how to run low-latency Apache Flink applications using Amazon Kinesis Data Analytics and discusses AWS’s open source contributions.

Building end-to-end ML workflows with Kubeflow Pipelines is a session from Antje Barth on Kubeflow, a popular open source ML toolkit for Kubernetes users. In this session, she talks about Kubeflow Pipelines, an add-on to Kubeflow that lets you build and deploy portable and scalable end-to-end ML workflows. She covers how to get started with Kubeflow Pipelines and how you can integrate powerful Amazon SageMaker features such as data labeling, large-scale hyperparameter tuning, distributed training jobs, and secure and scalable model deployment using SageMaker Components for Kubeflow Pipelines.

In Open source meets SaaS: Accelerating the path to SaaS adoption, Tod Golding introduces you to this new open source solution that was announced during the first week of re:Invent during the AWS Partner keynote. He shares the underlying architecture of this solution and the ways it enables you to leverage the power of an open source environment to accelerate your path to a SaaS delivery model.

In Open-source failure injection on AWS, colleagues Adrian Hornsby and Varun Jewalikar from Amazon Prime Video shared an open source approach toward failure injection on Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Container Service (Amazon ECS) using AWS Systems Manager, while discussing how Prime Video combines this approach with load testing for higher levels of resiliency. This is a great session, and Adrian’s demos really bring this to life. This is an increasingly important topic and priority for businesses that increasingly depend on their digital systems.

Next up is the session on Running Apache Cassandra workloads with Amazon Keyspaces. In this session, Arturo Hinojosa and Mihir Desai dive into what is Apache Cassandra and take a look at Amazon Keyspaces (for Apache Cassandra), from its architecture and key features through detailing data encryption by default, continuous backups using point-in-time recovery, and serverless throughput and storage management. Finally, they share how to design and visualize Amazon Keyspaces data models easily by using NoSQL Workbench.

If you are using AWS Copilot, an open source command-line tool that makes building, developing, and operating containerized applications on AWS more efficient, then you will enjoy AWS Copilot: Simplifying container development. Efe Karakus demonstrates simplifying container development through the example of deploying a new voting application, all while explaining how AWS Copilot works and how it might help you.

Other sessions

In Deep dive on AWS Glue Elastic Views, Akshat Vig and Almann Goo introduce this new capability in AWS Glue that makes use of the open source project PartiQL, a SQL-compatible query language that makes it easy to efficiently query data, regardless of where or in what format it is stored. This is a great session, and I can see a lot of developers finding these new capabilities really helpful.

AWS re:Invent recap: Deep dive on Amazon FSx for Lustre + Amazon S3 by Darryl Osborne provides more info from his session last week. If you missed it and want to know more about how fast, shared file systems can help your compute workloads achieve peak performance and reduce costs. Amazon FSx for Lustre is a fully managed, POSIX-compliant shared file system that provides high-performance storage for Amazon EC2 compute resources without the overhead and complexity of a self-managed file system.

Andy Hopper did a great session last week that can teach you ways to modernize your existing .NET Framework applications. Porting .NET Framework applications to .NET Core on Linux takes you through this task by taking your .NET Framework applications and through the power of the Porting Assistant for .NET. Hooper introduces the open source tool and walks you through modernizing a simple example application to show you how this can really simplify the process and make it more straightforward.

If you want to know more about AWS Controllers for Kubernetes (ACK), then Define AWS service resources with AWS Controllers for Kubernetes by Jay Pipes is just what you need. AWS Controllers for Kubernetes allows you to define your application’s AWS managed service resources using your Kubernetes API and manifests. There is no need to use a different configuration system or log in to the AWS console! Come learn about the design of the ACK, the features provided, and the roadmap for service integration.

We also had one of our customers talk about their latest open source project. In Untangling multi-account management with ConsoleMe, Netflix shared how ConsoleMe simplifies IAM permissions management by showing Netflix cloud resources in a single interface. This new project provides a multi-step, dynamic, self-service wizard, which determines permissions, generates resource policies automatically, and uses Zelkova (an AWS service that uses automated reasoning to analyze policies and the future consequences of policies) to intelligently apply low-risk permission requests. ConsoleMe also brokers application AWS credentials to provide users with short-lived IAM credentials for testing and development. Curtis Castrapel, a Senior Security Software Engineer at Netflix, walks you through this project including providing a demo of some of the features.

The final session I want to recommend is Michael Labieniec’s session on Flutter and AWS Amplify. In Build iOS & Android mobile apps in record time with Flutter and AWS Amplify, Labieniec grounds you in AWS Amplify and its capabilities before providing an overview of Flutter and then walking you through a coding demo of how you can build Flutter applications with AWS Amplify in no time at all.

Other highlights

Be sure to check out the first week’s highlights as well as those from week three.

Ricardo Sueiras

Ricardo Sueiras

Cloud Evangelist at AWS. Enjoy most things where technology, innovation and culture collide into sometimes brilliant outcomes. Passionate about diversity and education and helping to inspire the next generation of builders and inventors with Open Source.