Search Engines & Web Crawlers |
Sign up for Amazon Web Services |
Crawling, retrieving, processing, and distributing information scoured from across the web requires a huge amount of processing power and storage in addition to the advanced algorithms to manipulate the data, create indices, and respond to user queries. Moreover, the challenge of maintaining accurate search data is compounded by the constantly changing dynamics of the web and the competitive search engine market. When contemplating building a search engine or web crawler application, consider the following questions:
Amazon Web Services (AWS) provides a dependable and reliable platform that can be used to address the compute and storage requirements of Internet indexing and search applications.
Amazon Elastic Compute Cloud (Amazon EC2). Amazon EC2 provides resizable compute capacity on demand. The processing, algorithms, crawling, content caching, corpus creation, model and index production, system maintenance, and end user interfaces can all be hosted on Amazon EC2. This allows you to create and host your application components on standard operating systems and application environments and take advantage of the elastic nature of the AWS cloud to grow and shrink your usage as your processing needs change. Learn more
Amazon Simple Storage Service (Amazon S3). Amazon S3 provides a simple web services interface to store and retrieve any amount of data, at any time, from anywhere on the web. It is durable, highly available, and secure. Amazon S3 also stores multiple redundant copies of your data. Learn more
Amazon Relational Database Service (Amazon RDS). Amazon RDS makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. Learn more
Amazon SimpleDB. As you begin to accumulate search data, you can use Amazon SimpleDB to index and query your large datasets. Amazon SimpleDB is a web service providing the core database functions of data indexing and querying. You can write your applications to take advantage of Amazon SimpleDB’s simplicity and its ability to scale seamlessly. Amazon SimpleDB can also store small amounts of data, but also seamlessly integrates with Amazon S3 storage for larger storage. Learn more
Amazon Simple Queue Service (Amazon SQS). Amazon SQS provides a high performance, secure queuing system for your application that enables you to reliably distribute work between your application’s processes. Learn more
Amazon Mechanical Turk. The Amazon Mechanical Turk service is a marketplace for work that gives your application programmatic access to human intelligence. This service can be used to seek “human judgment” from within your algorithms. For example, when you identify a new web site, use Mechanical Turk to have actual humans classify the website (retail, sports, news, gaming) and provide site metadata to enable better search and discovery for your users. Learn more
Alexa. Amazon provides the Alexa Web Information Service (AWIS) and Alexa Top Sites services which deliver information and metadata about web sites. This information allows you to find information about domain registration, traffic data and site structure, as well as related links and access to historical data. Learn more about AWIS and Alexa Top Sites
Easy to use. AWS is designed to minimize much of the heavy lifting of setting up and managing your own IT infrastructure. You don’t need to purchase and configure hardware. You can get started with AWS in minutes and take your idea and deploy it to your customers with minimal friction. And, you can use the AWS Management Console, a variety of third-party management tools, or the well-document AWS web service APIs to manage and maintain your cloud infrastructure.
Flexible. AWS enables you to select the operating system, programming language, software tools, application platform, and other services you need. This eases the migration process for existing applications while preserving options for building new solutions.
Cost-Effective. You pay only for the compute power, storage, and other resources you use, with no long-term contracts or up-front commitments. For more information on analyzing the costs of using AWS, see the AWS Economics Center.
Reliable. With AWS, you take advantage of scalable, reliable, and secure global computing infrastructure, the virtual backbone of Amazon.com’s multi-billion dollar retail business that has been honed for over a decade.
Scalable and high-performance. Using AWS tools, Auto Scaling and Elastic Load Balancing, your application can scale up or down based on demand. Backed by Amazon’s massive infrastructure, you have access to the compute and storage resources when you need them.
Secure. AWS utilizes an end-to-end approach to secure and harden our infrastructure, including physical, operational, and software measures. For more information, see the AWS Security Center.
| Services |