Keeping Open Source Open – Open Distro for Elasticsearch
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. Visit the website to learn more.
中文版 – At AWS, we focus on solving problems for customers. Over the years, customer usage and dependencies on open source technologies have been steadily increasing; this is why we’ve long been committed to open source, and our pace of contributions to open source projects – both our own and others’ – continues to accelerate.
When AWS launches a service based on an open source project, we are making a long-term commitment to support our customers. We contribute bug fixes, security, scalability, performance, and feature enhancements back to the community. For example, we have been a significant contributor to Apache Lucene, which powers Amazon Elasticsearch Service. The Amazon EMR team has been making contributions to the Hadoop ecosystem for many years, and the Amazon Elastic Container Service for Kubernetes (EKS) team has been contributing to Kubernetes. We also invest in open source communities, training developers and operators, and sponsor open source events and conferences such as ApacheCon and KubeCon, and recently increased our support of the Apache Software Foundation. Marketing support helps communities by growing the number of end users and contributors, and accelerates the adoption of open source projects.
Many reasons drive our active participation in open source communities: First, it’s important to support healthy communities so that projects continue to develop and stay relevant. Second, maintaining an internal forked version of a project causes extra wasted effort, and can delay releasing updates to services as merges are made. Third, releasing new ideas as open source gathers others around the ideas to help move them into the mainstream. Fourth, open source collaboration across companies and academic institutions has produced some of the most significant breakthroughs in areas like Artificial Intelligence.
To get these benefits, customers must be able to trust that open source projects stay open. The maintainers of open source projects have the responsibility of keeping the source distribution open to everyone and not changing the rules midstream. When important open source projects that AWS and our customers depend on begin restricting access, changing licensing terms, or intermingling open source and proprietary software, we will invest to sustain the open source project and community. For example, recently there was increased concern from our customers that Oracle would stop supporting the version of Java that customers relied upon, or change the licensing terms, and customers had good reason to be concerned. We responded by offering the Corretto project, a no-cost, multi-platform, production-ready distribution of OpenJDK from Amazon. We invested to provide long-term consistency and confidence by committing that Amazon will distribute security updates to Corretto 8 at no cost until at least June, 2023, and to Corretto 11 until at least August, 2024. Corretto is a free, supported distribution that the community can now depend on while in parallel we continue to support and make contributions directly to OpenJDK.
Unfortunately, we are seeing other examples where open source maintainers are muddying the waters between the open source community and the proprietary code they create to monetize the open source. At AWS, we believe that maintainers of an open source project have a responsibility to ensure that the primary open source distribution remains open and free of proprietary code so that the community can build on the project freely, and the distribution does not advantage any one company over another. This was part of the promise the maintainer made when they gained developers’ trust to adopt the software. When the core open source software is completely open for anyone to use and contribute to, the maintainer (and anyone else) can and should be able to build proprietary software to generate revenue. However, it should be kept separate from the open source distribution in order to not confuse downstream users, to maintain the ability for anyone to innovate on top of the open source project, and to not create ambiguity in the licensing of the software or restrict access to specific classes of users.
If we look closely at many successful open source projects, they have all benefited from access to unfettered open source software. In fact, arguably those projects would not exist today without an ability to quickly assemble and innovate on top of pre-existing open source software. For example, a significant enabler to Elasticsearch is the Apache Lucene project, an Apache Software Foundation project which predates Elasticsearch by 11 years. Elasticsearch also leverages many additional permissively licensed open source projects such as the Jackson project for JSON parsing, Netty as the web container, and many more. The point being that open source software enables individuals and businesses to innovate faster, and downstream consumers depend on that ability. When maintainers insert confusion regarding the long-term viability of the open source, it impacts all downstream consumers.
Elasticsearch has played a key role in democratizing analytics of machine-generated data. It has become increasingly central to the day-to-day productivity of developers, security analysts, and operations engineers worldwide. Its permissive Apache 2.0 license enabled it to gain adoption quickly and allowed unrestricted use of the software. Unfortunately, since June 2018, we have witnessed significant intermingling of proprietary code into the code base. While an Apache 2.0 licensed download is still available, there is an extreme lack of clarity as to what customers who care about open source are getting and what they can depend on. For example, neither release notes nor documentation make it clear what is open source and what is proprietary. Enterprise developers may inadvertently apply a fix or enhancement to the proprietary source code. This is hard to track and govern, could lead to breach of license, and could lead to immediate termination of rights (for both proprietary free and paid). Individual code commits also increasingly contain both open source and proprietary code, making it very difficult for developers who want to only work on open source to contribute and participate. In addition, the innovation focus has shifted from furthering the open source distribution to making the proprietary distribution popular. This means that the majority of new Elasticsearch users are now, in fact, running proprietary software. We have discussed our concerns with Elastic, the maintainers of Elasticsearch, including offering to dedicate significant resources to help support a community-driven, non-intermingled version of Elasticsearch. They have made it clear that they intend to continue on their current path.
Meanwhile, we have gotten feedback from customers and partners that these changes are concerning to them as well. It has created uncertainty about the longevity of the open source project as it is getting less innovation focus. Customers also want the freedom to run the software anywhere and self-support at any point in time if they need to. We have therefore decided to partner with others such as Expedia Group and Netflix to create a new open source distribution of Elasticsearch named “Open Distro for Elasticsearch.” Open Distro for Elasticsearch is a value-added distribution that is 100% open source, which will be focused on driving innovation with value-added features to ensure users have a feature-rich option that is fully open source.
“Open source software and the freedoms it provides are important to Expedia Group,” said Subbu Allamaraju, VP Cloud Architecture at Expedia Group. “We are excited about the Open Distro for Elasticsearch initiative, which aims to accelerate the feature set available to open source Elasticsearch users like us. This initiative also helps in reassuring our continued investment in the technology.”
“At Netflix, we are committed to open source. We are both major users and contributors to open source,” said Christian Kaiser, VP Platform Engineering at Netflix. “Open Distro for Elasticsearch will allow us to freely contribute to an Elasticsearch distribution, that we can be confident will remain open source and community-driven.”
As was the case with Java and OpenJDK, our intention is not to fork Elasticsearch, and we will be making contributions back to the Apache 2.0-licensed Elasticsearch upstream project as we develop add-on enhancements to the base open source software. In the first release, we will include many new advanced but completely open source features including encryption-in-transit, user authentication, detailed auditing, granular roles-based access control, event monitoring and alerting, deep performance analysis, and SQL support.
The new advanced features of Open Distro for Elasticsearch are all Apache 2.0 licensed. With the first release, our goal is to address many critical features missing from open source Elasticsearch, such as security, event monitoring and alerting, and SQL support. We think these features will be exciting and valuable to developers and will encourage them to download, collaborate, and ultimately, contribute to the community. Many of these features are ones that we have been working on for inclusion in Amazon Elasticsearch Service. Open Distro for Elasticsearch enables users to run the same feature-rich distribution anywhere they wish, such as on-premises, on laptops, or in the cloud.
Our aim for Open Distro for Elasticsearch is to provide developers with the freedom to contribute to open source value-added features on top of the Apache 2.0-licensed Elasticsearch upstream project. We plan to contribute patches to the open source Elasticsearch base back upstream for the benefit of all. Open Distro for Elasticsearch will welcome developers and contributors from across the industry to invest in these important technologies with the confidence that they will always remain open source and permissively licensed. The whole idea of open source is that multiple users and companies can put it to work and everyone can contribute to its improvement. Open Distro for Elasticsearch is consistent with our commitment to make the necessary investments to keep open source truly open and enable anyone to benefit from our contributions.
You can download, begin using, and contribute to Open Distro for Elasticsearch today. The security features available in this initial release include encryption-in-transit, native Active Directory, LDAP, and OpenID authentication, roles-based and granular access control, and audit logging. Other key features include integrated event monitoring and alerting that opens up the full flexibility of the Elasticsearch query language to notify you of changes in your data, SQL support including REST and JDBC support, and an advanced performance analyzer. To download and learn more about Open Distro for Elasticsearch, visit https://opendistro.github.io/for-elasticsearch/.
For more details, see Jeff Barr’s post New – Open Distro for Elasticsearch.
photo credit: taken by Adrian Cockcroft at Petra, March 10, 2019