AWS Big Data Blog

Simplifying and modernizing home search at Compass with Amazon Elasticsearch Service

Amazon Elasticsearch Service (Amazon ES) is a fully managed service that makes it easy for you to deploy, secure, and operate Elasticsearch in AWS at scale. It’s a widely popular service and different customers integrate it in their applications for different search use cases.

Compass rearchitected their search solution with AWS services, including Amazon ES, to deliver high-quality property searches and saved searches for their customers.

In this post, we learn how Compass’s search solution evolved, what challenges and benefits they found with different architectures, and how Amazon ES gives them a long-term scalable solution. We also see how Amazon Managed Streaming for Apache Kafka (Amazon MSK) helped create event-driven, real-time streaming capabilities of property listing data. You can apply this solution to similar use cases.

Overview of Amazon ES

Amazon ES makes it easy to deploy, operate, and scale Elasticsearch for log analytics, application monitoring, interactive search, and more. It’s a fully managed service that delivers the easy-to-use APIs and real-time capabilities of Elasticsearch along with the availability, scalability, and security required by real-world applications. It offers built-in integrations with other AWS services, including Amazon Kinesis, AWS Lambda, and Amazon CloudWatch, and third-party tools like Logstash and Kibana, so you can go from raw data to actionable insights quickly.

Amazon ES also has the following benefits:

  • Fully managed – Launch production-ready clusters in minutes. No more patching, versioning, and backups.
  • Access to all data – Capture, retain, correlate, and analyze your data all in one place.
  • Scalable – Resize your cluster with a few clicks or a single API call.
  • Secure – Deploy into your VPC and restrict access using security groups and AWS Identity and Access Management (IAM) policies.
  • Highly available – Replicate across Availability Zones, with monitoring and automated self-healing.
  • Tightly integrated – Seamless data ingestion, security, auditing, and orchestration.

Overview of Amazon MSK

Amazon MSK is a fully managed service that makes it easy for you to build and run applications that use Apache Kafka to process streaming data. Apache Kafka is an open-source platform for building real-time streaming data pipelines and applications. With Amazon MSK, you can use native Apache Kafka APIs to populate data lakes, stream changes to and from databases, and power machine learning and analytics applications.

Overview of Compass

Urban Compass, Inc. (Compass) operates as a global real estate technology company. The Company provides an online platform that supports buying, renting, and selling real estate assets.

In their own words, “Compass is building the first-of-its-kind modern real estate platform, pairing the industry’s top talent with technology to make the search and sell experience intelligent and seamless. Compass operates in over 24 markets, with over 2 billion in sales this year to date, with 2,300 employees and over 15,000 agents with a vision to find a place for everybody in the world.”

How Compass uses search to help customers

Search is one of Compass’s primary features, which enables its website visitors and agents to look for properties within the platform. The Compass platform has the following search components:

  • Search services – Uses Amazon ES extensively to power searches across hyperlocal real estate data (spanning thousands of attributes). This search component acquires data through Amazon MSK after it’s processed in Apache Spark and stored in Amazon Aurora PostgreSQL.
  • Agent and consumer search – A frontend built on top of search services, which works as an interface between agents, consumers, and the Compass search services. It’s built in React and enables you to seamlessly search real estate data and access hyper-local filters.
  • Saved search – As a consumer, when you run a search and save it, the saved search indexes are updated with your search parameters. When a new listing comes into the system (indexed through the listings Elasticsearch index), the Compass search identifies the saved searches that match the new listing using the percolate feature of Elasticsearch and notifies you of the new property listing.

Evolution of the consumer search

The following sections get into the details of the Compass primary search feature and how its architecture evolved over time, starting with its initial Apache Lucene architecture.

Previous architecture with Apache Lucene

Compass started implementing its search functionalities with direct integration with Lucene, where it was configured through manually provisioned AWS virtual machines and manual installation.

In the following architecture diagram, the MLS (Master Listing Service) system’s data gets pushed through a common extract, transform, and load (ETL) framework and is populated into the listing database. From there, the data gets pushed to the Lucene cluster to be queried by the frontend search REST APIs.

This architecture had different pain points that involved a lot of heavy lifting with Lucene, such as:

  • Manual maintenance and configuration required specialized knowledge
  • Challenges around disk forecasting and growth
  • Scalability and high performance achieved through manual sharding
  • Configuring new data fields was labor-intensive

New architecture with Amazon ES

To overcome these challenges, Compass started using Amazon ES.

The following architecture diagram shows the evolution of Compass’s system. Compass ran Amazon ES and Lucene in parallel, shadowing the traffic to verify and quality assurance the results in production. They ran the services side by side for two months, until they were confident that Amazon ES replicated the same results.

In addition, Compass added full text search (description search) immediately after switching to Amazon ES. As a result, listing and features are available faster to their customers, which allowed Compass to expand nationally and implement a localized search.

Enhanced architecture with Amazon MSK

Compass further enhanced its architecture with Amazon MSK, which enabled parallel processing by different teams that push transformed events to the Kafka cluster. The following diagram shows the enhanced architecture.

With the new architecture, Compass search saw some immediate wins:

  • Reduced maintenance costs because Amazon ES is a managed service and you don’t need to take on the overhead of cluster administration or management
  • Additional performance benefits because the index build time reduced from 8 hours to 1 hour
  • Ability to create or tear down clusters with just one click, which eased maintenance

Because of the Amazon ES user interface and monitoring capabilities, the Compass team could assess the usage pattern easily and perform capacity planning and cost prediction.

Implementing saved searches with the Elasticsearch percolator

The Elasticsearch percolator inverts the query-document usage pattern—you index queries instead of documents and query with documents against the indexed queries. The search results for a percolate query are the queries that match the document.

Compass used this feature to implement their saved search feature by indexing each user’s search queries. When a property listing arrives, it uses Amazon ES to retrieve the queries that match and notifies the respective customers.

The following diagram illustrates this workflow.

Before percolate, Compass reran saved search queries to inspect for new matches. Percolate allows them to notify users in a timely manner about changes in their searches. On average, Compass stores 250,000 search queries.

When new listings arrive, they submit an average of 5–10 bulk requests per minute, where each bulk request contains 1,000 documents. The maximum latency on the Amazon ES side varies from 750–2,500 milliseconds, with an 18-node cluster of m5.12xlarge instance types.

The following diagram shows the saved search architecture. Search criteria is saved in search indexes. When a new listing arrives through the MLS and Amazon MSK listing stream, it executes the percolate processor, which pushes a message to the Amazon MSK saved search match topic stream. Then it gets pulled by the saved search service, from which notifications are pushed to the end-user.

The architecture has the following benefits:

  • When you look for properties against a search criteria, it gets saved in Amazon ES. As agents add properties into the system that match the saved search criteria, you get notified that a new property has been added that matches your criteria.
  • Because of the percolate feature, you get notified as soon as a property is added into the system, which reduced the lag from 1 hour to 1 minute.
  • Before percolate, the Compass team had a batch job that pulled new property records against searches, but now they can use push-based notification.

Compass has a great growth path as their platform scales to more listings, agents, and users. They plan to develop the following features:

  • Real-time streaming
  • AI for search ranking
  • Personalized search through AI

Conclusion

In this post, we explained how Compass uses Amazon ES to bring their customers relevant results for their real estate needs. Whether you’re searching in real time for your next listing or using Compass’s saved search to monitor the market, Amazon ES delivers the results you need.

With the effort they saved from managing their Lucene infrastructure, Compass has focused on their business and engineering, which has opened up new opportunities for them.

 


About the Author

Sakti Mishra is a Data Lab Solutions Architect at AWS. He helps customers architect data analytics solutions, which gives them an accelerated path towards modernization initiatives.

Outside of work, Sakti enjoys learning new technologies, watching movies, and travel.