Amazon CloudSearch (beta)

New to Amazon CloudSearch?

Activate your Amazon CloudSearch free 30 day trial today.

See Amazon CloudSearch Free Trial for details.



Amazon CloudSearch is a fully-managed search service in the AWS Cloud that allows customers to easily integrate fast and highly scalable search functionality into their applications. With a few clicks in the AWS Management Console, developers simply create a search domain, upload the data they want to make searchable to Amazon CloudSearch, and the search service then automatically provisions the technology resources required and deploys a highly tuned search index.

Amazon CloudSearch seamlessly scales as the amount of searchable data increases or as the query rate changes, and developers can change search parameters, fine tune search relevance, and apply new settings at any time without having to upload the data again.

Amazon CloudSearch enables customers to offload the administrative burden of operating and scaling a search service. Customers don't have to worry about hardware provisioning, data partitioning, or software patches. Amazon CloudSearch offers low, pay-as-you-go pricing with no up-front expenses or long-term commitments.

Read Amazon Web Service Evangelist Jeff Barr's CloudSearch blog post for more information about how you can start searching in an hour for less than $100 a month.

Get Started with Amazon CloudSearch for Free

If you are new to Amazon CloudSearch, you can get started for free! For more information, see Amazon CloudSearch Free Trial.

Easy to sign up,
pay only for what you use
Sign Up

What is Amazon CloudSearch?

Video: Amazon CloudSearch Overview

SmugMug on AWS:
CloudSearch Success Story

Video: SmugMug on AWS - CloudSearch Success Story


This page contains the following categories of information. Click to jump down:

Amazon CloudSearch Functionality

Built for high throughput and low latency, Amazon CloudSearch supports a rich set of features including free text search, faceted search, customizable relevance ranking, configurable search fields, text processing options, and near real-time indexing.

To use Amazon CloudSearch, you simply:

  • Create a search domain
  • Configure your search fields
  • Upload your data for indexing, and
  • Submit search requests from your web site or application

Amazon CloudSearch is currently available in the following Regions: US East (Northern Virginia), US West (Oregon), US West (N. California), EU (Ireland), and Asia Pacific (Singapore).


Service Highlights

Simple to Configure – You can make your data searchable using the AWS Management Console, API calls, or command line tools. Simply point to a sample set of data, and Amazon CloudSearch automatically proposes a list of index fields and a suggested configuration. It is easy to add or delete fields, and customize search options such as faceting (facets are index fields that represent categories that you want to use to refine and filter search results). You can make configuration changes without re-uploading your data. You can use analytics reports to track search metrics and user behavior.

Automatic Scaling For Data & Traffic – Amazon CloudSearch scales up and down seamlessly as the amount of data or query volume changes. Amazon CloudSearch handles the operational footprint and provisions search instances for you.

Low Latency, High Throughput – Amazon CloudSearch always stores your index in RAM to ensure low latency and high throughput performance even at large scale. Amazon CloudSearch was created from the same A9 technology that powers search on Amazon.com.

Easy Administration – Amazon CloudSearch is a fully-managed search service. Hardware and software provisioning, setup and configuration, software patching, and data partitioning are handled for you.

Rich Search Features – Amazon CloudSearch indexes and searches both structured data and plain text. It includes most search features that developers have come to expect from a search engine, such as faceted search, free text search, Boolean search, customizable relevance ranking, query time rank expressions, field weighting, and sorting of results using any field. Amazon CloudSearch also provides near real-time indexing of document updates.

Low Costs – Amazon CloudSearch is designed to be cost-efficient. You pay low hourly rates, and only for the resources you consume. Amazon CloudSearch offers low total cost of ownership for your search applications compared to operating a search environment on your own.

Secure – Amazon CloudSearch uses strong cryptographic methods to authenticate users and prevent unauthorized control of your domains. Amazon CloudSearch supports HTTPS and includes web service interfaces to configure firewall settings that control network access to your domain.


Pricing

There are no set-up fees or upfront commitments to begin using Amazon CloudSearch. Customers are billed according to their monthly usage across the following dimensions:

  • Search instances
  • Document batch uploads
  • IndexDocuments requests
  • Data transfer

Search Instances

You are billed for hourly instance charges. Amazon CloudSearch currently supports three types of search instances: Small, Large, and Extra Large:

As a managed service, Amazon CloudSearch determines the size and number of search instances required to deliver low latency, high throughput search performance. When you upload your data and configure your index, Amazon CloudSearch builds an index and picks the appropriate initial search instance type to ensure that your index can be stored in RAM.

As your data volume and index grow, CloudSearch will scale your search domain to a larger search instance type (or partition your index across multiple instances if you are already on the largest search instance type). Conversely, when your data volume and index shrink, CloudSearch scales your domain down to fewer search instances (or a smaller search instance type if your index fits on a single partition).

As with data volume, Amazon CloudSearch automatically scales your search domain to meet your traffic demands. When a search instance reaches its maximum CPU utilization, CloudSearch scales up your search domain by adding a search instance to handle the increased traffic. Conversely, when a search instance drops below its minimum CPU utilization, CloudSearch scales down your search domain by removing the additional search instances in order to minimize costs.

Pricing is per instance-hour consumed for each search instance, from the time a search instance is launched until it is terminated. Each partial instance-hour consumed is billed as a full hour.

Batch Uploads

You are billed for the total number of document batches uploaded to your search domain. Uploaded documents are automatically indexed.

  • $0.10 per 1,000 Batch Upload Requests (the maximum size for each batch is 5 MB)

IndexDocuments Requests

When you make configuration changes to your index, for example by adding a field, you will need to rebuild the index. To do this, you use the AWS Management Console, command line tools, or APIs to issue an IndexDocuments request. The charge for this request is:

  • $0.98 per GB of data stored in your search domain

Amazon CloudSearch may occasionally issue these calls for you. For example, as you add data to your domain, Amazon CloudSearch may proactively rebuild your index to improve query performance. You will not be charged in this case, and others, where you do not explicitly call IndexDocuments.

Data Transfer

The pricing below is based on data transferred "in" and "out" of Amazon CloudSearch.

Data transferred between Amazon CloudSearch and AWS services in the same region is free.

Data transferred between Amazon CloudSearch and AWS services in different regions will be charged as Internet Data Transfer on both sides of the transfer.

For traffic sent between Amazon CloudSearch and Amazon EC2 instances in the same region, you are only charged for the Data Transfer in and out of the Amazon EC2 instances, and standard Amazon EC2 Regional Data Transfer charges apply. For additional information, see the EC2 pricing description.

You can always see the resources you're consuming in Amazon CloudSearch via the Account Activity page on the AWS website, the AWS Management Console, CloudSearch command line tools, or CloudSearch APIs.

Cost Example

Here's a cost example based on the IMDb movie data set. Keep in mind that many different factors can affect the scaling characteristics of a search domain and how much it costs to operate, including the actual values in each field you want to search, the indexing options you configure for your domain, and how much compute power it takes to process your queries.

In the IMDb data set, each movie is represented by a 1 KB document. A small search instance can hold 1 million 1 KB documents. To calculate out how much it will cost to run a search domain for this data, let's assume the following usage levels:

  • 100,000 simple keyword search requests per day
  • 50 batch uploads per day, where each batch adds 1,000 new movies (up to a total of 1 million movies)
  • 4 IndexDocuments requests per month

At these usage levels, Amazon CloudSearch will automatically pick the Small search instance type to deploy the search domain. The monthly cost would be:

  • Small Search Instance: 720 hrs (24 hrs per day x 30 days) x $0.10 per hour = $72.00 per month
  • Batch Uploads: (50/1000) x $0.10 x 30 days = $0.15 per month
  • IndexDocuments (100MB): 0.1GB (amount of data stored in your search domain) x $0.98 per GB x 4 calls per month = $0.39 per month

TOTAL: $72.54/month

Note that when a search domain exceeds the capacity of a Small search instance, it will automatically be scaled up to a Large search instance and incur additional charges. You can monitor your usage and charges from the AWS Account Activity Page.


Detailed Description

Search Instances

You create an Amazon CloudSearch search domain for each collection of data that you want to make searchable. A search domain has one or more search instances, each with a finite amount of RAM and CPU resources for indexing data and processing requests. The number of search instances in a domain depends on the documents in your collection, and the volume and complexity of your search requests.

As a managed search service, Amazon CloudSearch determines the size and number of search instances required to deliver low latency, high throughput search performance. When you upload your data and configure your index, Amazon CloudSearch builds an index and picks the appropriate initial search instance type to ensure that your index can be stored in RAM.

As your data volume grows, Amazon CloudSearch will scale your search domain to a larger search instance type (or partition your index across multiple instances if you are already on the largest search instance type). Conversely, when your data volume shrinks, CloudSearch scales your domain down to fewer search instances (or a smaller search instance type if your index fits on a single partition).

As with data volume, Amazon CloudSearch automatically scales your search domain to meet your traffic demands. When a search instance nears its maximum load, CloudSearch scales up your search domain by adding a search instance to handle the increased traffic. Conversely, when traffic drops, Amazon CloudSearch removes unneeded search instances to minimize costs.

For example, if your collection is large enough that it requires three partitions, your search domain will have three search instances (one for each partition). As your traffic increases beyond the processing capacity of each search instance, the partition is replicated to provide additional capacity. You now have a total of six search instances supporting the three partitions in your domain. Further increases in traffic will result in additional instances being added.

Cloud search elastic scaling.

You can always see the resources you're consuming in Amazon CloudSearch via the Account Activity page on the AWS website, the AWS Management Console, CloudSearch command line tools, or CloudSearch APIs.

The amount of data that each search instance type can support is primarily dependent on the size of your documents (collection of your searchable data) and the configuration of the index fields. We will use a sample document and configuration for the public Wikipedia data set as a benchmark example to illustrate the capacity of each search instance type.

In Amazon CloudSearch, documents are described using the Search Data Format (SDF). The JSON version of the sample Wikipedia document shown below is approximately 1 KB in size:

{ "type": "add",
  "id": " wikipedia26678",
  "version": 5465249,
  "lang": "en",
  "fields": {
      "title": "Star Wars",
      "url": "http://en.wikipedia.org/wiki/Star_Wars",
      "author": "Jedi94",
      "type": "Article",
      "year": "1977",
      "teaser": "The Star Wars title card/logo, as seen in all films. 
        'Star Wars' is an American epic space opera film series created by 
        George Lucas. The first film in the series was originally released 
        on May 25, 1977, under the title Star Wars, by 20th Century Fox, 
        and became a worldwide pop culture phenomenon, followed by two 
        sequels, released at three-year intervals. Sixteen years after the 
        release of the trilogy's final film, the first in a new prequel 
        trilogy of films was released. The three films were ..." 
  } 
}

Each of the fields in the sample document needs to be configured with multiple indexing options such as the type of the field and whether the field is searchable, facet enabled, or result enabled. Each of these options directly impacts the capacity of a search instance in terms of the number of documents. The table below shows a sample configuration for the index fields for the Wikipedia data set.

Name Type Search Facet Result
title text
url text
author text
year uint
type literal
teaser text

Based on the size of the document (1 KB) and the index configuration shown above, each search instance type can hold the following number of documents.

Search Instance Type Data Capacity
Small Search Instance 1 Million documents
Large Search Instance 4 Million documents
Extra Large Search Instance 8 Million documents

Of course, these limits just illustrate an example. Different documents or a different configuration can drastically change the number of documents that an instance can hold. As you scale beyond the limit of a single Extra Large Search Instance, Amazon CloudSearch will automatically add up to 9 additional Extra Large Search Instances to scale your search fleet to support tens or hundreds of millions of documents. If you require additional scaling, please contact us.

You can see an example of the cost breakdown in the Pricing section.

Architecture

Amazon CloudSearch manages the server resources needed to implement a custom search solution. It provides three simple sub-services that you use to:

  • Configure search domains
  • Upload documents for indexing
  • Submit search requests

Configuration Service

The configuration service enables you to create and configure search domains. Each domain encapsulates a collection of data you want to search.

To create a new search domain, you simply provide a name to refer to your search domain. Search domains can then be configured by specifying indexing options, text options, and rank expressions:

  • Indexing options specify the fields you want to include in your index. Using the AWS Management Console or the command line tools, you can scan your data to automatically configure default indexing options.
  • Text options enable you to specify domain-specific dictionaries to ignore certain words during indexing, define common synonyms for terms, and map variations of a word to a common stem to enable matching on all the variants.
  • Rank expressions are mathematical functions that you can use to change how search results are ranked. By default, documents are ranked by a text relevance score that takes into account the proximity of the search terms and the frequency of those terms within a document. You can use rank expressions to include other factors in the ranking. For example, if you have a numeric field in your domain called 'popularity,' you can define a rank expression that combines popularity with the default text relevance score to rank relevant popular documents higher in your search results.

Document Service

You use the document service to make changes to a domain's searchable data. Every domain has a unique document service HTTP endpoint. When you send data to your domain, it is automatically indexed and the changes are made searchable in near real-time.

To send data to your domain, you need to describe it according to the Search Data Format (SDF). In SDF, each item that you want to be able to return as a search result is represented as a document. Every document has a unique id (docid), a version number, and one or more fields that contain the data that you want to search and return in results. Document fields can contain any UTF-8 string data. Your domain configuration's indexing options specify how you want to map the SDF document fields to fields in your search index.

Search Service

The search service handles search requests for a domain. Every domain has a unique search HTTP endpoint. When you send a search request, the search service returns a list of documents sorted by relevance. Search results can be returned in either JSON or XML.

Amazon CloudSearch provides a rich query language that enables you to search within particular fields, perform complex Boolean searches, retrieve facet information, and specify what data you want the results to include.

You can use the search tester in the Amazon CloudSearch console to test sample queries.

Amazon CloudSearch Architecture.

Getting Started

To get started with Amazon CloudSearch, you can use the Amazon CloudSearch Developer Guide and work through the Getting Started with Amazon CloudSearch tutorial.


Video: Introduction to Amazon CloudSearch

Introducing Amazon CloudSearch
To see a summary of Amazon CloudSearch features, please watch this video.

Video: Building a Search Application Using Amazon CloudSearch

Building a Search Application Using Amazon CloudSearch
To see how to use Amazon CloudSearch to develop a search application, including uploading and indexing a large public data set, setting up index fields, customizing ranking, and embedding search in a sample application, please watch this video.


Intended Usage and Restrictions

Your use of this service is subject to the Amazon Web Services Customer Agreement.

©2013, Amazon Web Services, Inc. or its affiliates. All rights reserved.