AWS Startups Blog
Yewno Uses AWS and ML to Analyze Vast Amounts of Data
The mass digitization of content, together with internet search capabilities, has put an unprecedented amount of information at our fingertips. Typing keywords into a search engine can produce a list with millions of results spread over thousands of pages. But how do you identify the best, most useful information buried in that sea of data?
Yewno’s sophisticated AI, built with AWS, analyzes millions of information sources in real-time. Rather than simply hunting for keywords, the startup’s algorithms read text, understand context and meaning, and explain why things are connected. A search for “depression,” for example, can differentiate between a geological depression, psychological depression, and the Great Depression, allowing users to find the content that’s most relevant to them.
“Anything you see that’s of interest to you, you can click on, you can interrogate, you can dig down to the next level,” says Ruth Pickering, Yewno’s co-founder and Chief Operating Officer. “That’s a very unique thing.”
She likens traditional search engines based on natural language processing to having a screwdriver, while Yewno’s technology sets you up with a more robust set of tools. “If all you need to do is screw in a screw, it’s perfect. It does the exact right thing,” Pickering says. “But if you want to do a range of things, particularly when you’re dealing with text, you probably want to have brought a toolkit.”
When Yewno was founded in 2015, the initial challenge was building the startup’s content base. Not only did the company need high-quality content, it also needed a platform that could handle massive amounts of data and scale up quickly as the company grew. AWS proved to be a critical element of Yewno’s success.
“We started building our content pipeline, and pretty early on we selected AWS,” says Pickering. “There’s no way that, early on, we could have afforded to build the infrastructure ourselves.” The cloud infrastructure could easily handle new content sets, consisting of tens of millions of records that needed to be ingested, and it has allowed Yewno to develop more sophisticated content pipelines as well.
“We scrape the internet and grab news, we’ll process it through our pipeline, and it’ll be available for our users in about five minutes,” says Brendan Volheim, Yewno’s chief technology officer. “That scalability was great within AWS. It’s quite amazing that we’re able to process things so quickly.”
Yewno also deploys machine learning algorithms across a range of AWS services, including Amazon EC2, EMR, DynamoDB, and Kinesis, to support the platform. Kinesis provides robust message passing between processes, then EMR will enable processing of big data in streaming (using Spark Streaming) and the results are stored in DynamoDB. EC2 instances are leveraged for auxiliary pre-processing steps.
“The expansive scalability allowed by AWS enables us to ingest real-time data through our ML pipeline without any degradation of service as well as no bottleneck of data in our pipeline processes,” per Pickering. “Previously this would have required dedicated infrastructure that would not have been cost effective due to the dynamically changing environment and content volumes.”
That infrastructure is also key to Yewno’s long-term ambitions. Rather than focusing on a specific subject area, such as law, finance, or life science, Yewno aims to be able to read different types of content and provide insights into a wide variety of subject areas. “We think great discovery in the future will come from the interchange and intersection of domains,” Pickering says.
To achieve that, the company is continually working to expand and increase its content base, scaling even faster and ingesting even more content from new data sources. Not having to worry about the infrastructure side of things has been a huge benefit.
“The nice thing about working with Amazon is that there’s been no roadblock that we’ve run into from Amazon’s side of things,” says Volheim.
Knowledge Is Power
Helping people make efficient, informed decisions lies at the heart of the company’s latest product, Yewno Edge.
Making decisions based on a small subset of information, after all, might not lead to the best outcome. Yewno Edge is designed to solve this problem. An AI-driven investment research platform, leveraging services such as EMR and ECS, Yewno Edge reads everything. The Edge terminal is like having a team of analysts at your fingertips. Research that would normally have taken a team of people weeks to put together can now be completed in just minutes—which can save companies time and money.
Together with AWS, Yewno is planning to streamline its own processes as well, merging its education, publishing, biomedical, finance, and other content pipelines together into a single pipeline. And it hopes to gather content from even more information sources—both historical and new—and cover more regions of the world.
“I think the thing that’s most important is to continually expand and increase the content base,” Pickering says. “And we’ll continue to add further modules, further components and further tools that help people find what they’re looking for.”
Interested in ML on AWS? Contact us today!