AWS Public Sector Blog
How governments can deliver national data more securely and at scale
This is a guest post from Epimorphics, an AWS Partner.
It’s a challenge to modernize vast public datasets to meet new and evolving uses. Epimorphics supports government agencies to deliver big data projects efficiently. Government data underpins many important public policy decisions and private sector investments. For example, house price indices influence mortgage lending, and river flow measurements guide flood planning. The data for this vital work must be authoritative, accessible, and available digitally.
However, these vast datasets often have elements dating back more than 100 years, and it’s hard to modernize them without interrupting the public and commercial activities that depend on them. It’s a challenge we tackle every day at Epimorphics, a company based in the United Kingdom (UK) that designs, builds, and runs data services on Amazon Web Services (AWS).
Two of the projects we support are the Environment Agency‘s Hydrology Data Explorer and the UK House Price Index (HPI). Both illustrate how cloud-based architecture and open standards can help public agencies upgrade critical data infrastructure while maintaining transparency, performance, and cost control.
The scale and stewardship challenge
Few datasets illustrate the demands on modern public infrastructure as clearly as hydrology. The Environment Agency’s service provides access to more than 5 billion readings from thousands of monitoring points across England. The oldest readings date from the nineteenth century, and new measurements arrive several times an hour from each monitoring point.
Scale is only part of the challenge. The archive covers multiple systems and eras, so records vary in structure, completeness, and sampling frequency. It’s a public sector reality that data created for a specific purpose often gets used for something entirely different decades later. However, those who set up the projects have gone, and their knowledge is not always retained in the institution.
Today’s users—from water companies to universities and emergency services—expect fast, stable access through automated tools and browser-based interfaces. We’ve learned that datasets must be designed to evolve without a total rebuild after each new requirement. Good metadata, open standards, and clear governance help avoid legacy silos and keep datasets useful after the original project team has gone.
A hybrid approach to data architecture
National datasets require an architecture that can cope with volume, variety, and evolving needs. For the Hydrology Data Explorer, we use a hybrid design separating the measurements and metadata that makes them intelligible. River activity readings sit in a traditional relational store, while a linked data layer describes the context—sites, variables, quality flags, and sampling intervals—that keeps the data traceable and reusable as needs evolve. A query for a river’s daily mean flow, for example, retrieves its values from the relational layer while drawing descriptive detail from the metadata model.
These components are brought together through a single application programming interface (API), which serves both browser users and automated tools. Requests are streamed directly. This means researchers can pull full historical series, which are sometimes decades long, without overloading the system. The same model underpins our approach to the UK HPI, where consistency of access matters as much as scale.
Cloud-based deployment on AWS provides the elasticity and resilience these services require. We use Amazon Elastic Kubernetes Service (Amazon EKS) to orchestrate workloads because it spreads processing across multiple Availability Zones. Measurement data is stored in Amazon Aurora, which provides a resilient, PostgreSQL-compatible relational layer, while scheduled ingest and background processing tasks run on AWS Lambda. Amazon Simple Queue Service (Amazon SQS) coordinates incoming data, which is particularly important because new telemetry and historic corrections often arrive simultaneously.
Given that demand is increasingly driven by automated programs that systematically browse the internet, which are called crawlers, and AI services, we manage traffic through a blend of right-sized capacity and quality-of-service rules. This helps prioritize legitimate users during peak load. Elastic Load Balancing (ELB) distributes requests across replicated services, which helps maintain the high availability that public agencies and industry users need.
Precision and resilience for national statistics
The UK HPI is published by HM Land Registry with our support. It is an accredited national statistic and this project highlights the importance of timing. Releases must be updated accurately within a narrow publication window and, crucially, must not be publicly visible before that.
To meet these requirements, we built an automated publishing pipeline. Data arrives in private areas for review and validation, after which each release is prepared across a set of replicated databases. At publication time, ELB switches traffic from the old dataset to the new one within minutes without the need to restart servers or interrupt ongoing queries.
This modernized service turns what was once a collection of spreadsheets into a bilingual, programmatic, linked data service. Over time, users have built their own processes and products around this consistency: Developers use the API to power dashboards and analysis tools, journalists and analysts can link directly to the authoritative value for a given region and month, and organizations such as insurers and mortgage providers rely on the predictable publication cycle for their modeling. Long-term reliability matters as much as technical sophistication.
Lessons in digital transformation
In our experience, several lessons apply across government data services:
- Start with purpose, not products – Agencies need to define the decisions that depend on the data before choosing a technology solution. The most advanced offerings won’t help if they don’t match the underlying need.
- Favor open standards and interoperability – Linked data principles and building data practices where information is findable, accessible, interoperable, and reusable (FAIR) support dataset evolution without locking future teams into rigid schemes.
- Invest in stewardship – Data infrastructure relies on people who understand and maintain it. Funding the building blocks—reference data, metadata models, APIs —can reduce long-term costs and make future projects faster and more predictable.
Future-ready infrastructure
Both the hydrology and UK HPI offerings inform future work, from groundwater and rainfall datasets to potential air quality services. Emerging AI techniques also create new opportunities.
Well-structured graph metadata can support Retrieval Augmented Generation (RAG) for AI systems, helping them locate and interpret relevant source data. Our unified API model further supports experimentation so developers can build agents that navigate datasets without overloading the service.
With expectations for transparency and automation on the rise, these projects suggest a wider blueprint: long-term, cloud-based data services that remain accessible, reliable, and adaptable. They show how public bodies can modernize critical datasets while preserving trust, and how well-designed infrastructure can support the next wave of innovation.
To find out more about how governments are using AWS to address their challenges, visit International Central Government on AWS.
