Amazon CloudSearch is a web service that allows you to quickly and easily make your data searchable. Once you upload the data to Amazon CloudSearch, your users can send search requests to find the most relevant results. Amazon CloudSearch gives you the full capabilities of a highly-available search engine without the time-consuming tasks of managing and scaling it, freeing you up to focus on your applications and business. It is simple to use; with a few clicks of the AWS Management Console or a few API calls you can make your data searchable. As with all Amazon Web Services, there are no up-front investments required, and you only pay for the resources you use.
A search engine makes it possible to search large collections of mostly textual data items (called documents) to quickly find the best matching results. Search requests are usually a few words of unstructured text, such as "matt damon movies". The returned results are usually ranked with the best matching, or most relevant, items listed first (the ones that are most "about" the search words).
Documents may be completely unstructured, or they can contain multiple fields that can optionally be searched individually. For example, a search service for movies might have documents with fields for title, director, actor, description, and reviews. Results returned by a search engine are typically proxies for the underlying documents, such as URLs that reference particular web pages. However, the search service can also return the actual contents of individual fields.
Amazon CloudSearch is a fully managed search service that automatically scales with the volume of data and complexity of search requests to deliver fast and accurate results. Amazon CloudSearch lets customers add search capability without needing to manage hosts, traffic and data scaling, redundancy, or software packages. Users pay low hourly rates only for the resources consumed. Amazon CloudSearch can offer significantly lower total cost of ownership compared to operating and managing your own search environments
A search service and a storage service are complementary. A search service requires that your documents already be stored somewhere, whether it's in files of a file system, data in Amazon S3, or records in an Amazon Simple DB or Amazon RDS instance. The search service is a rapid retrieval system that makes those items searchable with sub-second latencies through a process called indexing.
Search engines and databases are not mutually exclusive - in fact, they are often used together. If you already have a database that contains structured data, you might want to use a search engine to intelligently filter and rank the database contents using search keywords as relevance criteria.
A search service can be used to index and search both structured and unstructured data. Content can come from multiple sources and can include database fields along with files in a variety of formats, web pages, and so on. A search service can support customizable result ranking as well as special search features such as using facets for filtering that are not available in databases.
Amazon CloudSearch is currently available in the following AWS Regions: US East (Northern Virginia), US West (Oregon), US West (N. California), EU (Ireland), and Asia Pacific (Singapore).
To sign up for Amazon CloudSearch, click the Sign Up Now button on the Amazon CloudSearch detail page and complete the sign-up process. You must have an Amazon Web Services account. If you do not already have one, you will be prompted to create an AWS account when you begin the Amazon CloudSearch sign-up process.
After you have signed up, select Amazon CloudSearch from the AWS Management Console. Using the Amazon CloudSearch console you can quickly create a search domain, configure your search fields, upload sample data and send search queries to your search domain. Additionally, you can also use Amazon CloudSearch APIs and command line tools to create and and configure your search domain.
For more information, see the Getting Started tutorial in the Amazon CloudSearch Developer Guide.
Yes, Amazon CloudSearch comes with a rich set of command line tools, and is supported in the AWS SDKs for Java, .NET, PHP, and Ruby. For more information, see the Amazon CloudSearch Developer Guide and the Developing a Search Application Using CloudSearch demo.
A search domain is a data container and a set of services that make the data searchable. These services include:
You can create, manage, and delete search domains using the AWS Management Console, Amazon CloudSearch APIs, or Amazon CloudSearch command line tools.
Every search domain has a document service with a URL (document endpoint) that accepts document upload requests. You upload documents to your domain using the AWS Management Console, Amazon CloudSearch APIs, or the Amazon CloudSearch command line tools.
To make your data searchable, you need to describe your documents in the Search Data Format (SDF) and upload them in batches to the search domain. Amazon CloudSearch generates a search index from your SDF data according to the index fields and text options configured for the domain. As your data changes, you submit SDF document updates to add or delete documents from your index. Amazon CloudSearch applies data updates continuously, so your changes become searchable in near real-time.
To create SDF batches that describe your data, you can create JSON or XML text files that conform to the SDF data conventions.
When creating SDF document batches, you need to provide the following information:
The following example shows a JSON version of an SDF batch:
Note that integer values such as the version and year are not enclosed in quotes, and that values in a multi-value field such as genre are listed in a JSON array.
To make this data available to Amazon CloudSearch, you can either save this text to a file and upload it using the console or the command line tools, or you can submit it directly using the HTTP API.
Once you have created SDF data, you can submit it to Amazon CloudSearch as the body of a POST request to the document endpoint for your domain.
For example, if you have an SDF batch in a file called data.json you can issue the following cURL request to submit the batch:
curl -X POST --upload-file outputfiles1.json --header "Content-Type: application/json" doc-domain-name-domainid.us-east-1.cloudsearch.amazonaws.com/2011-02-01/documents/batch
Documents are automatically indexed when you upload them to your search domain. You can also explicitly re-index your documents when you make configuration changes by sending an IndexDocuments request.
Certain configuration options, such as adding a new index field or updating your stemming or stopword dictionaries, are not available until your domain is re-indexed. When you have made changes that require indexing, the domain’s status will indicate that it needs to be indexed. You can initiate indexing from the AWS Management Console, with the IndexDocuments API, or using the cs-index-documents command line tool.
Every search domain has a REST-based search service with a unique URL (search endpoint) that accepts search requests for its document set. You can send search requests to the URL through a web browser, the AWS Management Console, or the Amazon CloudSearch APIs.
No, not at this time.
At this time, there is no way to automatically migrate a search domain from one region to another. You will need to create a new domain in the target region, configure the domain and upload your data, then delete the original domain.
To delete a search domain, click on Delete Domain button in the Amazon CloudSearch console, or issue the the cs-delete-domain command using the Command Line Tools.
To delete documents you specify a delete operation in your SDF document that contains the id of the document you want to remove and a document version number greater than the current version number for that document.
You can submit data updates through the Amazon CloudSearch console, using the cs-post-sdf command, or by posting a request directly to the domain's document service endpoint.
If you wish to maintain your domain’s endpoints, you can send a delete for each document that is in your domain.
If you do not need to maintain your domain’s endpoints, you can create a new search domain by copying the configuration from the domain you wish to delete, and then delete the old domain.
Amazon CloudSearch provides features to index and search both structured data and plain text, including faceted search, free text search, Boolean search expressions, customizable relevance ranking, query time rank expressions, field weighting, searching and sorting of results using any field, and text processing options including tokenization, stopwords, stemming and synonyms. It also provides near real-time indexing for document updates.
To enable more advanced features such as faceting or fielded search, you can add structure to your documents by formatting your documents in XML or JSON as a collection of attribute-value pairs. Amazon CloudSearch requires structured text to be in a particular format called Search Data Format (SDF). Below is an example of an XML document in SDF for a DVD and its reviews. A search request could be made searching for keywords in the customer reviews such as "best movie", or for a match on the director's name such as "lucas".
Faceting allows you to categorize your search results into refinements on which the user can further search. For example, a user might search for "umbrellas", and facets allow you to group the results by price, such as $0-$10, $10-$20, $20-$40, and so on. Amazon CloudSearch also allows for result counts to be included in facets, so that each refinement has a count of the number of documents in that group. The example could then be: $0-$10 (4 items), $10-$20 (123 items), $20-$40 (57 items), and so on.
Amazon CloudSearch only supports English at this time.
Documents uploaded to a search domain typically become searchable within seconds to a few minutes.
There is no intrinsic limit on the number of search requests that can be sent to a search domain.
Your search requests are typically processed within a few hundred milliseconds, frequently much faster. Latency is affected by many factors including the time it takes for your request and responses to travel between your own application and your search domain, the complexity of your search request and how heavily you are using your search domain.
Amazon CloudSearch is designed to efficiently process a wide range of search requests very quickly. Search requests vary in complexity depending on the expressions that determine which documents match (the match set) and additional criteria that determine how closely each document matches (the rank function). Search requests that match a large number of documents take longer to process than those that match very few documents. Search requests that compute complex rank functions take longer to process than those that rank using a very simple criterion such as a single field. To help you understand the difference in complexity between Search requests, the CPU consumed by every request is returned as part of the response.
Amazon CloudSearch is available in the US East (Northern Virgina), US West (Oregon), US West (N. California), EU (Ireland), and Asia Pacific (Singapore) Regions. Applications hosted in the same AWS Region as your search domain will experience the fastest communication times.
A search instance is a single search engine in the cloud that indexes documents and responds to search requests. It has a finite amount of RAM and CPU resources for indexing data and processing requests.
A search partition is the portion of your data which fits on a single search instance. A search domain can have one or more search partitions, and the number of search partitions can change as your documents are indexed.
Search domains scale in two dimensions: data and traffic. As your data volume grows, you need more (or larger) Search instances to contain your indexed data, and your index is partitioned among the search instances. As your request volume or request complexity increases, each Search Partition must be replicated to provide additional CPU for that Search Partition. For example, if your data requires three search partitions, you will have 3 search instances in your search domain. As your traffic increases beyond the capacity of a single search instance, each partition is replicated to provide additional CPU capacity, adding an additional three search instances to your search domain. Further increases in traffic will result in additional replicas, to a maximum of 5, for each search partition.
The number of partitions you need depends on your data and configuration, so the maximum data you can upload is the data set that when your search configuration is applied results in 10 search partitions. When you exceed your search partition limit, your domain will stop accepting uploads until you delete documents and re-index your domain. If you need more than 10 search partitions, please contact us.
CloudSearch is a fully managed search service that automatically scales your search domain and selects the number and type of search instances. All search instances in a given search domain are of the same type and this type can change over time as your data or traffic grows. It is not possible to choose the search instance type or the number of search instances directly.
You can find out the number and type of search instances in your search domain by using the AWS Management Console, Amazon CloudSearch APIs, Amazon CloudSearch Command Line Tools. The number and type of search instances change over time and automatically scale up and down according to your indexable data and search traffic.
Search domains typically react to increases in traffic changes within minutes. Changes in data volume or a reduction in traffic may take longer but you can accelerate this process through invoking an IndexDocuments operation.
You send us your data using a secure and encrypted SSL connection by using HTTPS instead of HTTP when you connect to Amazon CloudSearch.
We do not support user-generated encryption keys. You will need to decrypt the data and upload it using HTTPS.
Yes. We support HTTPS for all Amazon CloudSearch requests.
You can restrict your data requests to specific IP subnets through Amazon CloudSearch access policies. For finer-grained user access control, you will need to authenticate the request before sending to your search domain.
For more information on security on AWS, see Amazon Web Services: Overview of Security Processes and the Windows on Amazon EC2 Security Guide.
There are no set-up fees or commitments to begin using the service. Following the end of the month, your credit card will automatically be charged for that month's usage. You can view your charges for the current billing period at any time on the AWS web site by logging into your Amazon Web Services account and clicking Account Activity under Your Web Services Account.
For detailed pricing information, see Amazon CloudSearch Pricing.
Yes, a free trial is available for new CloudSearch customers. For more information, see Amazon CloudSearch 30 Day Free Trial.