Q: What is Amazon CloudSearch?
Amazon CloudSearch is a web service that allows you to quickly and easily make your data searchable. Once you upload the data to Amazon CloudSearch, your users can send search requests to find the most relevant results. Amazon CloudSearch gives you the full capabilities of a highly-available search engine without the time-consuming tasks of managing and scaling it, freeing you up to focus on your applications and business. It is simple to use; with a few clicks of the AWS Management Console or a few API calls you can make your data searchable. As with all Amazon Web Services, there are no up-front investments required, and you only pay for the resources you use.
Q: What is a search engine?
A search engine makes it possible to search large collections of mostly textual data items (called documents) to quickly find the best matching results. Search requests are usually a few words of unstructured text, such as "matt damon movies". The returned results are usually ranked with the best matching, or most relevant, items listed first (the ones that are most "about" the search words).
Documents may be completely unstructured, or they can contain multiple fields that can optionally be searched individually. For example, a search service for movies might have documents with fields for title, director, actor, description, and reviews. Results returned by a search engine are typically proxies for the underlying documents, such as URLs that reference particular web pages. However, the search service can also return the actual contents of individual fields.
Q: What benefits does Amazon CloudSearch offer?
Amazon CloudSearch is a fully managed search service that automatically scales with the volume of data and complexity of search requests to deliver fast and accurate results. Amazon CloudSearch lets customers add search capability without needing to manage hosts, traffic and data scaling, redundancy, or software packages. Users pay low hourly rates only for the resources consumed. Amazon CloudSearch can offer significantly lower total cost of ownership compared to operating and managing your own search environments
Q: Can Amazon CloudSearch be used with a storage service?
A search service and a storage service are complementary. A search service requires that your documents already be stored somewhere, whether it's in files of a file system, data in Amazon S3, or records in an Amazon Simple DB or Amazon RDS instance. The search service is a rapid retrieval system that makes those items searchable with sub-second latencies through a process called indexing.
Q: Can Amazon CloudSearch be used with a database?
Search engines and databases are not mutually exclusive - in fact, they are often used together. If you already have a database that contains structured data, you might want to use a search engine to intelligently filter and rank the database contents using search keywords as relevance criteria.
A search service can be used to index and search both structured and unstructured data. Content can come from multiple sources and can include database fields along with files in a variety of formats, web pages, and so on. A search service can support customizable result ranking as well as special search features such as using facets for filtering that are not available in databases.
Q: Which AWS Regions is Amazon CloudSearch available in?
Amazon CloudSearch is currently available in the following AWS Regions: US East (Northern Virginia), US West (Oregon), US West (N. California), EU (Ireland), and Asia Pacific (Singapore).
Q: How do I get started with Amazon CloudSearch?
To sign up for Amazon CloudSearch, click the Create Free Account button on the Amazon CloudSearch detail page and complete the sign-up process. You must have an Amazon Web Services account. If you do not already have one, you will be prompted to create an AWS account when you begin the Amazon CloudSearch sign-up process.
After you have signed up, select Amazon CloudSearch from the AWS Management Console. Using the Amazon CloudSearch console you can quickly create a search domain, configure your search fields, upload sample data and send search queries to your search domain. Additionally, you can also use Amazon CloudSearch APIs and command line tools to create and and configure your search domain.
Q: Are there any developer tools available?
Yes, Amazon CloudSearch comes with a rich set of command line tools, and is supported in the AWS SDKs for Java, .NET, PHP, and Ruby. For more information, see the Amazon CloudSearch Command Line Tool Reference and the Developing a Search Application Using CloudSearch demo. For a list of SDKs, APIs, and other tools, see the Amazon CloudSearch Developer Resources topic in the CloudSearch forum.
Q: Where can I find sample code for Amazon CloudSearch?
You can download sample application code from the cloudsearchdemos repository on GitHub.
Q: What is a search domain and how do I create one?
A search domain is a data container and a set of services that make the data searchable. These services include:
A document service that allows you upload data to your domain for indexing.
A search service that allows you to perform search requests against your indexed data.
A configuration service for controlling your domain's behavior (including relevance ranking).
You can create, manage, and delete search domains using the AWS Management Console, Amazon CloudSearch APIs, or Amazon CloudSearch command line tools.
Q: How do I upload documents to my search domain?
Every search domain has a document service with a URL (document endpoint) that accepts document upload requests. You upload documents to your domain using the AWS Management Console, Amazon CloudSearch APIs, or the Amazon CloudSearch command line tools.
Q: Do my documents need to be in a particular format?
To make your data searchable, you need to describe your documents in the Search Data Format (SDF) and upload them in batches to the search domain. Amazon CloudSearch generates a search index from your SDF data according to the index fields and text options configured for the domain. As your data changes, you submit SDF document updates to add or delete documents from your index. Amazon CloudSearch applies data updates continuously, so your changes become searchable in near real-time.
Q: How do I create documents formatted in the Search Data Format (SDF)?
To create SDF batches that describe your data, you can create JSON or XML text files that conform to the SDF data conventions.
- When creating SDF document batches, you need to provide the following information:
- The operation type: add or delete
- A unique identifier
- A version number (must be incremented in subsequent add or delete operations)
- The two-letter language code (such as en for English)
- The actual fields and their data
The following example shows a JSON version of an SDF batch:
Note that integer values such as the version and year are not enclosed in quotes, and that values in a multi-value field such as genre are listed in a JSON array.
To make this data available to Amazon CloudSearch, you can either save this text to a file and upload it using the console or the command line tools, or you can submit it directly using the HTTP API.
Q: How do I upload SDF documents using the APIs?
Once you have created SDF data, you can submit it to Amazon CloudSearch as the body of a POST request to the document endpoint for your domain. For example, if you have an SDF batch in a file called data.json you can issue the following cURL request to submit the batch:
curl -X POST --upload-file outputfiles1.json --header "Content-Type: application/json" doc-domain-name-domainid.us-east-1.cloudsearch.amazonaws.com/2011-02-01/documents/batch
Q: How do my documents get indexed?
Documents are automatically indexed when you upload them to your search domain. You can also explicitly re-index your documents when you make configuration changes by sending an IndexDocuments request.
Q: When do I need to re-index my domain?
Certain configuration options, such as adding a new index field or updating your stemming or stopword dictionaries, are not available until your domain is re-indexed. When you have made changes that require indexing, the domain’s status will indicate that it needs to be indexed. You can initiate indexing from the AWS Management Console, with the IndexDocuments API, or using the cs-index-documents command line tool.
Q: How do I send search requests to my search domain?
Every search domain has a REST-based search service with a unique URL (search endpoint) that accepts search requests for its document set. You can send search requests to the URL through a web browser, the AWS Management Console, or the Amazon CloudSearch APIs.
Q: Can a search domain span multiple regions or Availability Zones?
No, not at this time.
Q: Can I move a search domain from one region to another?
At this time, there is no way to automatically migrate a search domain from one region to another. You will need to create a new domain in the target region, configure the domain and upload your data, then delete the original domain.
Q: How do I delete my search domain?
To delete a search domain, click on Delete Domain button in the Amazon CloudSearch console, or issue the the cs-delete-domain command using the Command Line Tools.
Q: How do I delete documents from my search domain?
To delete documents you specify a delete operation in your SDF document that contains the id of the document you want to remove and a document version number greater than the current version number for that document.
You can submit data updates through the Amazon CloudSearch console, using the cs-post-sdf command, or by posting a request directly to the domain's document service endpoint.
Q: How do I empty my search domain?
If you wish to maintain your domain’s endpoints, you can send a delete for each document that is in your domain.
If you do not need to maintain your domain’s endpoints, you can create a new search domain by copying the configuration from the domain you wish to delete, and then delete the old domain.
Q: What search features does Amazon CloudSearch provide?
Amazon CloudSearch provides features to index and search both structured data and plain text, including faceted search, free text search, Boolean search expressions, customizable relevance ranking, query time rank expressions, field weighting, searching and sorting of results using any field, and text processing options including tokenization, stopwords, stemming and synonyms. It also provides near real-time indexing for document updates.
Q: What is structured text?
To enable more advanced features such as faceting or fielded search, you can add structure to your documents by formatting your documents in XML or JSON as a collection of attribute-value pairs. Amazon CloudSearch requires structured text to be in a particular format called Search Data Format (SDF). Below is an example of an XML document in SDF for a DVD and its reviews. A search request could be made searching for keywords in the customer reviews such as "best movie", or for a match on the director's name such as "lucas".
Q: What is faceting?
Faceting allows you to categorize your search results into refinements on which the user can further search. For example, a user might search for "umbrellas", and facets allow you to group the results by price, such as $0-$10, $10-$20, $20-$40, and so on. Amazon CloudSearch also allows for result counts to be included in facets, so that each refinement has a count of the number of documents in that group. The example could then be: $0-$10 (4 items), $10-$20 (123 items), $20-$40 (57 items), and so on.
Q: What languages does Amazon CloudSearch support?
Amazon CloudSearch only supports English at this time.
Q: Does Amazon CloudSearch support geospatial search?
Although Amazon CloudSearch does not have a native type to support latitude and longitude, you can implement geographically-based searching and sorting by representing latitude and longitude as integers. For more information, see Searching and Ranking Results by Geographic Location in the Amazon CloudSearch Developer guide, download the CloudSearchGeoSpatial demo from GitHub, and view the Building Location-Based Search With Amazon CloudSearch webinar.
Q: How quickly will my uploaded documents become searchable?
Documents uploaded to a search domain typically become searchable within seconds to a few minutes.
Q: How many search requests can I send to my search domain?
There is no intrinsic limit on the number of search requests that can be sent to a search domain.
Q: What factors affect the latency of my search requests?
Your search requests are typically processed within a few hundred milliseconds, frequently much faster. Latency is affected by many factors including the time it takes for your request and responses to travel between your own application and your search domain, the complexity of your search request and how heavily you are using your search domain.
Q: What makes one search request more complex than another?
Amazon CloudSearch is designed to efficiently process a wide range of search requests very quickly. Search requests vary in complexity depending on the expressions that determine which documents match (the match set) and additional criteria that determine how closely each document matches (the rank function). Search requests that match a large number of documents take longer to process than those that match very few documents. Search requests that compute complex rank functions take longer to process than those that rank using a very simple criterion such as a single field. To help you understand the difference in complexity between Search requests, the CPU consumed by every request is returned as part of the response.
Q: Where should I run my search application to minimize communication time with my search domain?
Amazon CloudSearch is available in the US East (Northern Virgina), US West (Oregon), US West (N. California), EU (Ireland), and Asia Pacific (Singapore) Regions. Applications hosted in the same AWS Region as your search domain will experience the fastest communication times.
Q: What is a search instance?
A search instance is a single search engine in the cloud that indexes documents and responds to search requests. It has a finite amount of RAM and CPU resources for indexing data and processing requests.
Q: What is a search partition?
A search partition is the portion of your data which fits on a single search instance. A search domain can have one or more search partitions, and the number of search partitions can change as your documents are indexed.
Q: How does my search domain scale to meet my application needs?
Search domains scale in two dimensions: data and traffic. As your data volume grows, you need more (or larger) Search instances to contain your indexed data, and your index is partitioned among the search instances. As your request volume or request complexity increases, each Search Partition must be replicated to provide additional CPU for that Search Partition. For example, if your data requires three search partitions, you will have 3 search instances in your search domain. As your traffic increases beyond the capacity of a single search instance, each partition is replicated to provide additional CPU capacity, adding an additional three search instances to your search domain. Further increases in traffic will result in additional replicas, to a maximum of 5, for each search partition.
Q: How much data can I upload to my search domain?
The number of partitions you need depends on your data and configuration, so the maximum data you can upload is the data set that when your search configuration is applied results in 10 search partitions. When you exceed your search partition limit, your domain will stop accepting uploads until you delete documents and re-index your domain. If you need more than 10 search partitions, please contact us.
Q: Do I need to select the number and type of search instances for my search domain?
CloudSearch is a fully managed search service that automatically scales your search domain and selects the number and type of search instances. All search instances in a given search domain are of the same type and this type can change over time as your data or traffic grows. It is not possible to choose the search instance type or the number of search instances directly.
Q: How do I find out the number and type of search instances in my search domain?
You can find out the number and type of search instances in your search domain by using the AWS Management Console, Amazon CloudSearch APIs, Amazon CloudSearch Command Line Tools. The number and type of search instances change over time and automatically scale up and down according to your indexable data and search traffic.
Q: How quickly does my search domain scale to accommodate changes in data and traffic?
Search domains typically react to increases in traffic changes within minutes. Changes in data volume or a reduction in traffic may take longer but you can accelerate this process through invoking an IndexDocuments operation.
Q: How do I upload my data to Amazon CloudSearch securely?
You send us your data using a secure and encrypted SSL connection by using HTTPS instead of HTTP when you connect to Amazon CloudSearch.
Q: My data is already encrypted. Can I just send you the encrypted data and the encryption key?
We do not support user-generated encryption keys. You will need to decrypt the data and upload it using HTTPS.
Q: Do you support encrypted search results?
Yes. We support HTTPS for all Amazon CloudSearch requests.
Q: How can I prevent specific users from accessing my search domain?
You can restrict your data requests to specific IP subnets through Amazon CloudSearch access policies. For finer-grained user access control, you will need to authenticate the request before sending to your search domain.
Q: Where can I find more information about security on AWS?
For more information on security on AWS, see Amazon Web Services: Overview of Security Processes and the Windows on Amazon EC2 Security Guide.
Q: How will I be charged and billed for my use of Amazon CloudSearch?
There are no set-up fees or commitments to begin using the service. Following the end of the month, your credit card will automatically be charged for that month's usage. You can view your charges for the current billing period at any time on the AWS web site by logging into your Amazon Web Services account and clicking Account Activity under Your Web Services Account.
Q: How much does it cost to use Amazon CloudSearch?
For detailed pricing information, see Amazon CloudSearch Pricing.
Q: Is a free trial available for Amazon CloudSearch?
Yes, a free trial is available for new CloudSearch customers. For more information, see Amazon CloudSearch 30 Day Free Trial.