SwiftKey is a language technology start-up best known for its award-winning consumer application, which aids touchscreen typing by offering personalized predictions and corrections. The company has around 100 staff headquartered in London with a presence in San Francisco, Seoul, and Beijing. The most recent version includes SwiftKey Cloud, a secure, cloud-based hub for users' personalized language insights, and SwiftKey Flow, a gesture input technology that allows users to ‘flow’ entire phrases without lifting their finger — simply by gliding over the space bar between words. SwiftKey was the best-selling paid application on Google Play in 2012. SwiftKey also makes its technology available to OEMs via a Software Development Kit (SDK). Currently, SwiftKey supports 60 languages. SwiftKey users have entered or “flowed” more than 1.5 trillion characters to date and have saved themselves over 450 billion keystrokes so far.
SwiftKey makes its predictions based on custom language models which can be personalized with the individual user's language habits. In order to do this, the personalization service generates a language model from content users have composed and stored in services such as Gmail, Twitter, and Facebook. This information needs to be sent to the user’s phone fast enough to speed up their typing wherever they are in the world. Dr Sebastian Spiegler, SwiftKey’s Data Team Lead, explains: “We collect and analyze tens of terabytes of web crawled data to create language models and build cloud services, like personalization, for millions of active users. To do this, we need a highly scalable, multi-layered system that can grow with steadily increasing demands. We are forecast to be on more than 100 million devices by the end of 2013.”
SwiftKey needs a powerful processing engine for the artificial intelligence technology that generates the predictions. According to Spiegler, “The office-based server had become a bottleneck for processing, and there were two options for getting around it. One was setting up server racks in our office. This meant we would depend on local Internet access, requiring dedicated system administrators and making it impossible to move office without interrupting service. The other was to move to cloud computing.” In addition, Spiegler says, “Our customer-facing services could not easily scale with the exponential growth in our user base that is now measured in tens of millions.”
Spiegler explains that Amazon Web Services (AWS) was chosen to improve performance: “SwiftKey’s concerns were not primarily cost-savings but time-to-market for new features and services as well as reliability of those services. AWS was already a major player at that time, setting standards for cloud computing, so it was a natural choice.” SwiftKey had used AWS in its early start-up stage before purchasing office-based servers, alongside a smaller hosting service for customer-facing services. The company decided to move these services to AWS because, as Spiegler explains, “AWS could deliver a higher level of service, more flexibility in terms of ad-hoc scalability, and more control over actual instances.”
SwiftKey uses client-based, patented Natural Language Processing techniques to understand the relationship between words, as well as Machine Learning to adapt on-the-fly to a user’s personal language and interaction style. Predictions are based on language models that assign a probability to words and short phrases given their context. These are created by a language pipeline that processes tens of terabytes of stored text. SwiftKey uses Amazon Elastic MapReduce (Amazon EMR) to run the fifteen plus pipeline steps that include language classification, general cleaning, and tokenization, followed by sequencing to estimate occurrences of words and short phrases. The language models with assigned probabilities are generated following additional post-processing steps. The company has also developed a custom workflow management system that automates pipeline runs and decides whether a certain step is necessary.
In order to manage the processing of multiple terabytes of data, SwiftKey uses a hosted Hadoop framework running on Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). Meta-data is stored in different databases, including MongoDB, MySQL and Amazon SimpleDB. As Spiegler explains, “Using Amazon EMR, we can spin-up any number of Hadoop clusters within minutes. The service also allows us to set up large-scale experiments. For example, to optimize parameters we have to test thousands of possible combinations. Using AWS, we can do this and get results in hours rather than days or weeks.”
For automated deployment, the company uses a Chef server hosted on Amazon EC2 in which packages and services have been combined into cookbooks and packages are version-controlled by recipes. Instance types and security configurations are easy to change. “Combining Chef with an Amazon EC2 snapshot, we can easily create or replicate production and test environments,” says Spiegler. “For research and development projects, we also use an open source DevOps library called Pallet (palletops.com) to create and configure Amazon EC2 instances. This allows us to use exactly the same deployment scripts to configure our Amazon EC2 instances on local virtual machines—meaning that we can test and develop offline, then deploy to AWS when we actually want a service externally available or require bigger hardware.”
Using AWS, SwiftKey has been able to scale services on demand during peak times, such as product or feature launches, which would not have been possible before. In addition, by using Amazon CloudFront and Amazon Route 53, SwiftKey can serve users anywhere in the world reliably from the EU (Ireland) Region. According to Spiegler, SwiftKey is refining its approach to using AWS: “Initially, we started with multiple AWS accounts, one per project. AWS Identity and Access Management (Amazon IAM) allows us to set permissions for individual users and groups of users, and we are now moving our services to fewer development and production accounts, in order to simplify system management.”
“Using AWS allows developers to take care of services,” says Spiegler, “and means we have access to easy scalability. We do not need to manage Hadoop clusters ourselves, so we can focus on the actual development without the hassle of administration.”
To learn more about how AWS can help your data needs, visit our Big Data details page: http://aws.amazon.com/big-data/.