AWS Case Study: Recorded Future
About Recorded Future
Recorded Future is a Massachusetts-based start-up company whose mission is to record everything the world knows about the future. Founded in early 2008 by the co-founders of Spotfire (now a division of TIBCO), Recorded Future is supported by In-Q-Tel and Google Ventures, among other leading venture capital firms. The company has developed a unique technology that allows users to unlock predictive signals from unstructured text by scanning tens of thousands of high-quality online databases, websites, and publications in real time.
While the applications for this technology are quite extensive, Recorded Future currently focuses on three areas: the quantitative finance industry, the commercial competitive intelligence industry, and the national security and intelligence area. Within these markets, the company’s technology assists customers in identifying and understanding historical events based on the patterns that emerge. By organizing events around dates from the text, they are able to empower customer to predict what might happen in the future.
Recorded Future empowers analysts with forecasting tools via two channels. First, it allows analysts to ask temporal questions like “Who’s traveling to Venezuela next week?” or “What events are supposed take place over the next 10 days in Iran?” by quickly aggregating all future-oriented observations from the source materials and building a timeline of weighted opinions about the likely timing of future events. Second, Recorded Future provides curated web intelligence signals for predictive mathematical models. A massive historical archive with more than two years of 5 billion structured time-tagged facts from the web, along with past and real-time quantifiable signals, are invaluable assets for predictive statistical models.
The Recorded Future Premium service, an analytic environment hosted in the Amazon Elastic Compute Cloud (Amazon EC2) at RecordedFuture.com, mines unstructured text from hundreds of thousands of sources on the open Web–everything from government filings, to blogs, to Twitter feeds–and organizes and aggregates the data to provide a picture of what the world knows about the future.
With no downloads or plug-ins needed, users can interact with the data immediately via a true Web 2.0 interface or a web service API. The data and time exploration tools feature momentum and sentiment measurements across multiple languages. These calculations contextualize buzz on the web and measure how positive and negative that discussion is.
Why Amazon Web Services
Jason Hines, Federal Director at Recorded Future, explains the decision to go with AWS: “AWS is a great environment for startups looking to develop and test new technology without having to front hefty investment costs in hardware.” Recorded Future's index contains 5 billion time-tagged facts, all algorithmically extracted from the Web. Hines explains the data intensity: "We harvest and process hundreds of thousands of documents every hour, in real time from tens of thousands of sources so our index is always growing. Within about 10 or 15 minutes of news breaking, it’s harvested, analyzed, and available in our system."
Recorded Future uses several AWS solutions. Amazon Elastic Block Store (Amazon EBS) allows the company to store and move data between instances. Amazon Route 53 helps the company manage its DNS, Elastic Load Balancing helps load-balance its web servers, and Amazon Simple Storage Service (Amazon S3) stores backups.
Today, Recorded Future is running on 40-70 instances, depending on need and load; 1-1.5TB of RAM, and about 100 TB of storage. With AWS, the company can adjust this as needed, based on load. A diagram of Recorded Future's architecture is shown below.
Looking ahead, Recorded Future – a team made up mostly of software engineers - doesn’t see the need to procure or deploy their own hardware. With AWS, they find their needs are met. When asked why they chose AWS, Ulf Mansson, Recorded Future's Chief of Operations, concludes: “We simply haven’t found any other provider that can meet all of our diverse computing needs.”
They credit AWS for managing their whole platform which they engineered specifically for the cloud. Knowing that their small team didn’t have the resources to take on such a large challenge within a typical IT infrastructure, they turned to AWS and Amazon EC2. “AWS is our architecture," says Mansson. "Literally every component of the Recorded Future machinery utilizes AWS. The only thing we don’t use AWS for is our email and CRM.”
Now the Recorded Future engineers can focus on what they do best: pplying their energy and efforts on organizing the web to predict the future.
To find out more about how AWS can help you store and process big data, visit our Big Data details page: http://aws.amazon.com/big-data/.