AWS Startups Blog
GraphLab: Big Data Analytics Scaled From Inspiration to Production
GraphLab, Inc., is the company behind a complete platform for using scalable machine learning to build big data analytics products. Companies like Zillow, Adobe, Zynga, Pandora, Bosch, and ExxonMobil rely on GraphLab to turn big data inspiration to predictive applications in production in the form of recommender systems, fraud detection systems, sentiment and social network analyzers, among other applications and services.
Carlos Guestrin is the CEO and cofounder of GraphLab and the Amazon Professor of Machine Learning at the University of Washington. A world-recognized leader in the field of machine learning, Carlos was named one of the 2008 “Brilliant 10″ by Popular Science magazine, received the 2009 IJCAI Computers and Thought Award for his contributions to Artificial Intelligence, and garnered a Presidential Early Career Award for Scientists and Engineers (PECASE).
What is machine learning and how has it evolved over the past 10 years?
Machine learning is a science that advances the idea that computers can be programmed to “learn” from patterns in data and use that knowledge as a basis for making highly accurate predictions and decisions in an automated way. Over the past decade, we have seen machine learning manifested in applications that enable self-driving cars, online stores that recommend products we’re likely to buy, targeted marketing, and credit card fraud detection, to name a few. The variety and volume of data now available has put machine learning at the forefront of investment, because it promises to transform our “big data” into insights that improve all areas of life and business.
Can you share the story behind GraphLab? How did it come to be?
GraphLab began in 2008 at Carnegie Mellon University under my stewardship and that of my co-founding doctoral and post-doctoral students. The team had been working on the application of machine learning for advanced graph analysis and required more scalable tools to implement the work they were publishing and sharing. The tools they built, became so popular that a modest workshop to discuss them drew over 300 participants — 10 times more than anticipated. This outcome pointed to both an unmet need and a well-designed product platform. What the team had done is leverage Amazon EC2 and significant advances in graph representation, asynchronous communication, and scheduling to achieve orders-of-magnitude performance gains over alternative systems for graph analysis.
Fast forward to 2012, a time during which my wife Emily, also a Computer Science professor, and I were considering new job opportunities. We had been speaking with universities out east when Jeff Bezos stepped in to help with the UW recruiting effort. The Amazon founder and Chief Executive met with us both and subsequently established two endowed professorships in machine learning to help fund our salaries.
With that I became the new Amazon Professor of Machine Learning for UW and moved to the PNW, bringing with me some of my talented students and aspirations of making a significant impact in the emerging area of big data analytics. A year later, with funding from Madrona Ventures and NEA, GraphLab the company was born and a short year after that in March 2014, GraphLab’s first commercial offering, GraphLab CreateTM, was released in Beta form.
Tell us about GraphLab Create and how it simplifies big data analysis.
Transforming raw data to insights and building predictive applications is a laborious and complex process today. Such endeavors require data scientists or similarly knowledgeable software engineers and an array of disparate and complex tools to gather, clean, model, analyze and ultimately present those insights to some store or application. In many instances the process is made more lengthy and expensive by the need to reimplement the prototype into code that can be used in a production environment. This situation leaves many data scientists hamstrung by lack of programming experience and organizations unable to derive value from their data.
GraphLab makes a platform that provides all of the tools a data scientist needs to go from an inspiring idea to a production-worthy data product quickly. Current users report that GraphLab Create helps them be immensely more productive and deliver value more quickly while requiring less programming expertise and fewer personnel.
And it supports predictive applications that can be deployed on AWS?
The journey from raw data to business-transforming predictive analytics often starts with a data scientist, a laptop, and a prototype but invariably requires a critical proof-of-concept stage to test the model at scale, likely in the cloud. Here for many the journey is cut short because although scaling to AWS is easy enough, for many data scientists reimplementing their prototype to production-ready code is not.
That’s where GraphLab comes in. GraphLab Create can be run entirely in the AWS environment. Data scientists can take their GraphLab-built prototype from a laptop to AWS in seconds by changing a single line of code. Data sets of any size as well as models can be loaded and accessed from Amazon S3. GraphLab also provides tools for deploying, monitoring, and optimizing data pipelines and predictive services across AWS clusters.
What are some common use cases? How are people using GraphLab Create?
The most common uses of GraphLab Create span a variety of disciplines:
- Retail: recommender systems and pricing prediction (e.g., airfare)
- Financial services: fraud detection through behavior and transaction analysis
- Biomedicine: disease prediction by medical records analysis, personalized drug design
- Telecommunications: prediction of customer churn
- Social network analysis: identification of key network and community influencers
- Marketing and media: sentiment analysis, targeting
Are only enterprise companies using predictive applications?
No, at all. State and local governments use GraphLab to analyze citizen sentiment and pinpoint which areas of local infrastructure need the most immediate attention. Biomedical research teams have used GraphLab to analyze clinical notes in the prediction of patient propensity toward a particular disease. Sensor networks of all types help provide valuable data whose analysis can make for safer air and rail travel. Generally, governments, research organizations, health and services providers are all mirroring the desires of industry to put their data to work in improving the effectiveness of their processes and people.
So it’s important for early-stage companies to be considering data science? When is the right time for a startup to think about big data?
Data science and data-driven decision making are a key consideration for companies of every size. Large companies are updating their dated customer recommender systems to take advantage of more advanced predictive techniques that include real-time inputs not just purchase histories. Text and sentiment analysis of surveys and comment fields is helping boost customer satisfaction and reduce churn. Similarly, early-stage companies that have data analytics–based business models are likely to begin life with data scientists and the application of machine learning at the core of their prototypes. This is particularly true of firms in the sales and marketing as well as media and advertising verticals that are innovating in customer targeting, acquisition, and retention. Other newer startup categories, for which data science is central, include those creating highly specialized predictive services customized for a specific vertical and application — for instance predicting financial waste in health care, supply chain optimization, or insurance claim fraud detection.
What all of these firms, large and small, have in common is lots of data but few data science resources and very limited compute. This is where the true power of GraphLab with AWS comes into view. With the scaling barrier removed, big data finally moves past hype to a very real source of inspired products.
Ten years from now how do you predict machine learning will be used to drive big data insights?
In a decade, machine learning will be accessible to many more people than the data scientists and skilled engineers who are the most productive with it today. Business analysts and line-of-business owners, for instance, will come to rely on predictive services for near real-time access to conditions that affect profit. Service providers in the government, health care, and private sectors will be able to customize products to the needs of individuals. It is also likely that awareness of machine learning and its impact to data-driven decision making will become mainstream enough for nontechnical persons to understand its value as a significant differentiator of products and services.
What’s next for GraphLab?
GraphLab is on a journey to democratize machine learning and aims to be instrumental in actualizing the ten-year vision discussed. In the short term, we are looking forward to making version 1.0 of our flagship product, GraphLab Create, generally available on October 15. With this initial commercial offering, the power of data science will be delivered to the hands of every organization, and many more big data aspirations will make their way to production by means of GraphLab.