AWS Open Source Blog
Category: Analytics
re:Cap part one – open source at re:Invent 2019
As the dust settles after another re:Invent closes, I wanted to put together a quick summary of all the open source-related announcements that happened in the run up to this year’s re:Invent and the week itself. If you are interested in open source in mobile web development, devops, containers, security, big data and data analytics, […]
Introducing real-time anomaly detection in Open Distro for Elasticsearch
There is an enormous increase in real-time streaming applications across a wide range of industries such as finance, health, information technology, retail, and the Internet of Things (IoT). Organizations depend on log analytics solutions to detect aberrations in the data and identify critical situations. Examples include finding fraudulent behavior in financial transactions, discovering suspicious IP addresses […]
Gearing up for re:Invent 2019 with Open Distro for Elasticsearch sessions
re:Invent 2019 has a new track this year and it’s all about Open Source! There are lots of great sessions coming up on Open Distro for Elasticsearch and its components such as Alerting, Security, and Performance Analyzer. Join in to learn more and participate in hands-on workshops! Keep a lookout for our sessions on machine […]
Deploying Spark jobs on Amazon EKS
UPDATE, March 2021: This blog post describes how to deploy self-managed Apache Spark jobs on Amazon EKS. AWS now provides a fully managed service with Amazon EMR on Amazon EKS. This new deployment option allows customers to automate the provisioning and management of Spark on Amazon EKS, and benefit from advanced features such as Amazon […]
What Amazon gets by giving back to Apache Lucene
At pretty much any scale, search is hard. It becomes dramatically harder, however, when searching at Amazon scale: think billions of products, complicated by millions of sellers constantly changing those products on a daily basis, with hundreds of millions of customers searching through that inventory at all hours. Although Amazon has powered its product […]
Add Single Sign-On (SSO) to Open Distro for Elasticsearch Kibana using SAML and Okta
Open Distro for Elasticsearch Security implements the web browser single sign-on (SSO) profile of the SAML 2.0 protocol. This enables you to configure federated access with any SAML 2.0 compliant identity provider (IdP). In a prior post, I discussed setting up SAML-based SSO using Microsoft Active Directory Federation Services (ADFS). In this post, I’ll cover […]
Demystifying Elasticsearch shard allocation
At the core of OpenSearch’s ability to provide a seamless scaling experience, lies its ability distribute its workload across machines. This is achieved via sharding. When you create an index you set a primary and replica shard count for that index. Elasticsearch distributes your data and requests across those shards, and the shards across your […]
Open Distro for Elasticsearch 1.1.0 released
We are happy to announce that Open Distro for Elasticsearch 1.1.0 is now available for download! Version 1.1.0 includes the upstream open source versions of Elasticsearch 7.1.1, Kibana 7.1.1, and the latest updates for alerting, SQL, security, performance analyzer, and Kibana plugins, as well as the SQL JDBC driver. You can find details on enhancements, […]
Use Elasticsearch’s _rollover API For efficient storage distribution
Many Open Distro for Elasticsearch users manage data life cycle in their clusters by creating an index based on a standard time period, usually one index per day. This pattern has many advantages: ingest tools like Logstash support index rollover out of the box; defining a retention window is straightforward; and deleting old data is […]
Add Single Sign-On to Open Distro for Elasticsearch Kibana Using SAML and ADFS
Open Distro for Elasticsearch Security (Open Distro Security) comes with authentication and access control out of the box. Prior posts have discussed LDAP integration with Open Distro for Elasticsearch and JSON Web Token authentication with Open Distro for Elasticsearch. Security Assertion Markup Language 2.0 (SAML) is an open standard for exchanging identity and security information […]