AWS Startups Blog

Curating Financial and Business News Using NLP

In finance, there is a theory called the efficient market hypothesis. It states that stock prices incorporate all present information. Unfortunately, an asymmetry of information has always existed, and those with the most current information possess a tremendous advantage. This information asymmetry is so substantial and the advantage conferred too great that our legal system has rightly declared insider trading as wrong.

With the meteoric rise of the Internet and the subsequent Information Age, there was optimistic chatter about the rise of the retail investor. Investors could use the internet to trade just like the institutions on Wall Street and in Canary Wharf. Access to the market and the newsreels was within reach of anyone with a home internet connection. Unfortunately for the retail investor, the flood of information from the Information Age has effectively made analysis impossible without the assistance of computers.

Before the propagation of machine learning (ML) and AI to the general public, the news curating services were expensive, meaning they were largely out-of-reach for the average retail investor. Now, however, machine learning and Big Data are changing the game again.

Enter CityFALCON

After working at large corporations like Nokia, Skype, and Microsoft, CityFALCON founder Ruzbeh Bacha decided to strike out on his and began counting on the equity markets to provide him with a regular source of income. To be successful, Bacha quickly realized he needed to sift through the thousands of stories, forum posts, and Tweets on financial topics to find the most relevant news—and he had to sift fast, because the guys at the investment banks were getting their relevant information earlier than everyone else through the use of curation services and entire departments staffed with well-paid research analysts. The aforementioned asymmetry of information was rampant.

Using his coding skills, Bacha built the platform that eventually morphed into CityFALCON. The company, which brings news curation services to the masses at no cost, scored its first income-generating project just six months after incorporating in 2014 and has since graduated from Microsoft and Octopus accelerators. As of 2017, the company has also launched its service on Amazon Echo, and their scale now demands they outsource some of their computing needs.

The CityFalcon team in Zurich, Switzerland

Getting Started

Bacha’s problem was not a lack of information. In fact, it was the opposite: a flood of information with varying degrees of relevance at high speed. When relevant news broke, the first source could be a Tweet, a news story, or a forum post. No individual could possibly monitor such a huge volume of information and identify the important pieces in a timely fashion.

Bacha also knew news stories are simply text—and text can be parsed and analyzed. This led to the concept of parsing news stories from tens of thousands of sources and ranking them for relevancy.

Today, CityFALCON allows users to build profiles and track specific topics. Relying on ML and natural language processing (NLP) techniques, users are served stories relevant to their interests. The company ranks the results based on relevancy of the search terms and the profile’s history. The results can be further refined with user-adjusted filters, increasing relevancy for any particular user’s needs at the time of search. And each result displays the number of similar stories, indicating whether that particular topic is gaining traction or remains a mere whisper on the internet.

This approach produces a two-fold effect: the irrelevant information is filtered out with minimal user effort, and customers are made aware of potentially relevant information they may have otherwise overlooked.

The Technology

CityFALCON relies on modern ML and NLP techniques to mark stories as relevant and to suggest the most relevant ones to its users. The service’s 2017 release on Amazon Echo incorporates natural language understanding (NLU) concepts, too, as the Echo must understand the command and respond in an expected way. The company published a thorough yet straightforward educational article about its use of these technologies here.

The minimum viable product was built in 2014 using Ruby on Rails and Postgres.  With the increasing coverage of publications and onboarding of corporate clients, the company had to re-architect their infrastructure and now leverage non-relational database Cassandra, and Elasticsearch.

Like other tech firms, CityFALCON relies on its users to improve its ML algorithms. Users can mark stories as relevant, liked, or disliked, and those streams of information flow back into the company’s backend to improve its scoring algorithms. Their dedication goes further, though, as the company is building its own taxonomy specifically for financial language processing.

As their user base and number of monitored sources have grown, so has its need for computational power. The company is aggregating and curating more than a million news stories a day, getting ready to onboard hundreds of thousands of users from across the globe, and be able to respond to hundreds of millions of API calls with very low latency each month.

In Summary

The unending march of technology will surely witness ways in which the landscape changes yet again. For now, the do-it-yourself short-term trader can leverage CityFALCON’s services to compete with the titans of finance on the news curation front and make informed decisions without suffering information asymmetry disadvantages. Long-term investors stand to benefit from CityFALCON’s profile-specific recommendations, serving otherwise overlooked, yet still relevant, stories, posts, and tweets in a timely and, most importantly, manageable way.

The flood of data and information will only increase as more of the world comes online. Individuals simply cannot monitor everything without the assistance of computers, and CityFALCON aims to provide that assistance for even the tightest of budgets.