AWS Public Sector Blog

Oxford University Press makes high-quality language data available using AWS

September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.

Oxford University Press logoOxford University Press (OUP) is a department of the University of Oxford and the largest university press in the world. In 2015, OUP launched the Oxford Global Languages (OGL) initiative aiming to build lexical resources for 100 of the world’s languages and make them freely available online. A special focus is reserved to digitally under-represented languages (languages spoken by millions of native speakers, but whose lexical resources on the Internet may be of poor quality or unavailable). To date, OUP has launched 20 language websites, which include languages as diverse as Hindi, Indonesian, Latvian, Quechua, Zulu, and more.

OUP provides its world-renowned dictionary content via the Oxford Dictionaries API. This gives programmatic access to its data and allows developers to integrate high-quality language data into their products. New languages and functionalities are regularly launched in line with customer feedback and the OGL initiative. OUP also licenses its language data to global tech companies and its data is embedded in applications ranging from e-readers, games, and educational software to predictive text, search, and machine learning (ML).

Working beyond a minimum viable product

Both the OGL initiative and the Oxford Dictionaries API started as minimum viable products (MVPs) with the launch of the first languages and API endpoints. However, if successful, these projects would need to scale quickly to accommodate dozens of language datasets and endpoints. As data is core to the OUP business and licensing data is a major revenue stream for the Dictionaries department, OUP needed a reliable, secure, and scalable solution for storing, transforming, and delivering data to its users and customers.

Launching new language websites and API endpoints on a regular basis, while serving existing customers also required a deployment process able to update code, data, and infrastructure with no downtime.

Finding the scalability and flexibility needed

OUP knew that on-premises solutions wouldn’t provide the scalability and flexibility required for developing an MVP and expanding it in case of success. OUP chose Amazon Web Services (AWS) because it matched the requirements around scalability and flexibility, provided managed services for storing and accessing data securely, and offered options for deployment and automation.

The System Architecture, explained

The system architecture evolved as the project reached maturity. In the MVP phase, OUP used Amazon Elastic Beanstalk running a multi-container docker configuration, Amazon Relational Database Service (Amazon RDS) for storing data, an Amazon Elasticsearch Service (Amazon ES) cluster deployed on Amazon Elastic Compute Cloud (Amazon EC2) instances for search functionality, and an ELK stack for logging.

The first language websites and API endpoints were launched with this architecture, which allowed OUP to quickly develop a scalable solution, take advantage of managed services for storing data, and deploy new releases with no downtime using the blue/green deployment strategies offered by Elastic Beanstalk.

As the project evolved, the system architecture was reshaped. The data was consolidated into an Amazon ES domain as it was served as pure JSON, Kubernetes was chosen as container orchestrator for its scalability, granularity, and deployment options, the home-brew ELK stack was replaced by a separate Amazon ES domain, and the infrastructure security was tied up with the use of private subnets and a bastion host.

The figure below illustrates the architecture in production:

OUP architecture in production diagram

Deployments to production are performed using in-place rolling updates via Kubernetes for simple code changes and blue/green deployment using Amazon Route 53 to switch the DNS to a new Kubernetes cluster for data and infrastructure changes.

OUP plans to experiment with Amazon Elastic Container Service for Kubernetes (Amazon EKS) to manage the Kubernetes infrastructure (Amazon EKS was not available when OUP started using Kubernetes).

OUP’s benefits of working in the cloud

AWS helped OUP to scale from MVP to mature products and services fulfilling its mission and generating revenue. In particular, engineers in the dictionaries department were empowered by AWS and took responsibility and control of the software development, system architecture, and deployment processes. This led to a quicker time to market as the first languages and API endpoints were launched within 4-6 months of the programme inception.

The use of strategies like blue/green deployments enabled engineers to release new languages, features, and bug fixes on a regular basis. Before using AWS, OUP used to release new data every six months via processes that lasted weeks and involved several departments and third-party suppliers. Using the tools offered by AWS for automation and deployments, this changed to an average of 1-2 deployments per month in line with business and marketing requirements.

The use and transition to managed services was a major benefit for productivity and growth. As the engineering team got more familiar with technologies like Amazon ES, it was more efficient to move towards managed services as they provide an easier way to scale and a peace of mind for upgrades and deployments. The use of AWS managed services also frees up time for the engineering team as engineers can concentrate on developing new features and processing new data, instead of maintaining infrastructure.

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.