Developing a modern data-driven strategy in the public sector

Public sector organizations across the globe want to harness the power of data to make better decisions by putting data at the center of every decision-making process. Data-driven decisions lead to more effective responses to unexpected events. For government organizations, a more detailed understanding of citizen ambitions and requirements creates a better citizen experience.

While data is abundant and growing rapidly, just producing and storing this data doesn’t automatically create value. As public sector organizations accumulate a significant amount of data, much of that data lives in different silos. These silos can require different platforms, different management, different security, and different authorization approaches. All of this increases the operational risk and the operational cost – and can make it difficult to analyze data holistically. Also, these systems are typically not built for exponential growth of event data like log files, click stream data, and machine-generated data from internet of things (IoT) devices.

To overcome these challenges, organizations are shifting toward building a data-driven organization based on a modern data strategy.

The five key areas of a modern data strategy

The modern data strategy focuses on five key areas:

Data product mindset – This means adopting a product-oriented mindset versus a platform-oriented one that is typically seen in traditional data strategies. Product orientation means that we design and create data-enabled offerings that take into account business and technical requirements in order to solve business problems and positively affect the citizen experience.
Business and technology ownership – Owners of traditional data strategies tend to be technology leaders. A modern data strategy is owned by both the business and technology leaders jointly. Integrating business and technology in this way reflects the importance of data.
Agility – Modern data strategy is agile, building and refining data products in an iterative process of testing, experimenting, and learning to inform the next revisions. In a traditional data strategy, teams first gather all the requirements known upfront and spend time building the foundation before delivering tangible business value to the stakeholders. In a modern data strategy, the iterative, agile approach can reduce time to value and deliver incremental foundation. Learn more about the agile approach for developing government services.
Governance – According to modern data-driven practices, organizations federate or distribute governance to balance nonnegotiable security, privacy, and regulation concerns with the need to innovate. In a more traditional strategy, teams may create organizational constructs where everything must be tightly controlled by a centralized team, restricting innovative developments for one team’s needs.
Technology – Purpose-built data stores and analytics services that are based on business needs allow organizations to build cloud-based platforms that are scalable and resilient. In contrast, a traditional data strategy can often take a one-size-fits-all approach to data store and analytics services, regardless of the actual need. The scalability limitations of on-premises environments can slow down agility and innovation.

Figure 1. A high level comparison between a modern and traditional data strategy in five key areas: mindset, ownership, artifacts, governance, and technology.

Figure 1. A high level comparison between a modern and traditional data strategy in five key areas: mindset, ownership, artifacts, governance, and technology.

Modernizing data strategy with data architectures on AWS

To shift into this new paradigm, public sector organizations are rapidly modernizing their data architectures. To support this, organizations can use scalable data lakes at the core, like Amazon Simple Storage Service (Amazon S3), to build the foundation of the data lake. A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning (ML) to guide better decisions. Services like Amazon Athena, which is a serverless interactive query service, can analyze all the data directly inside the Amazon S3 data lake. In addition to the data lake, organizations can use a combination of purpose-built database and analytics services to support different use cases, like relational data, non-relational data, or big data processing.

For instance, organizations can use Amazon EMR as a cloud big data solution for petabyte-scale data processing and interactive analytics. They can use ML in open-source frameworks such as Apache Spark, Apache Hive, and Presto. Amazon OpenSearch Service helps perform interactive log analytics, real-time application monitoring, and website search. SQL can help analyze structured and semi-structured data across data warehouses, operational databases, and data lakes. Amazon Redshift is a cloud data warehouse that can help break down data silos and gain real-time and predictive insights on data, with no data movement or transformation.

AWS Glue is a serverless data integration service that can discover, prepare, and integrate data from multiple sources. It helps organizations to move and transform data seamlessly between all their data stores and support unified data access.

To manage the unified governance of the modern data architecture, organizations can use services like AWS LakeFormation and Amazon DataZone to manage security and governance at scale, enable fine-grained permissions across data lakes, break down data silos, and make all data discoverable with a centralized data catalog.

Ultimately, modern data architecture is about integrating the data lake and data warehouse with the most appropriate purpose-built data and analytics stores to drive holistic insight across the business for all users.

Figure 2. The data journey through a modern data platform built on AWS. Starting with ingesting any type or amount of data, then using the purpose-built analytics and artificial intelligence (AI) and ML services to deliver insights.

How the Royal Netherlands Meteorological Institute modernized their data strategy with AWS

One of the public sector pioneers who successfully shifted from a traditional data strategy toward a modern data strategy by applying the data-driven organization mindset and processes is the Royal Netherlands Meteorological Institute (KNMI). KNMI is the Dutch national weather service. It provides weather forecasting services and monitors changes in climate, air quality, and seismic activity. These services have a significant impact on nearly all citizens in daily life, across aviation, shipping, traffic, agriculture, and more. In order to drive this paradigm shift toward a modern data strategy, KNMI worked on two main areas.

As part of a change in mindset and ownership practices, KNMI first established the KNMI DataLab team, with the aim to create a collaboration platform between data scientists, domain experts, and external third-party partners (e.g. data providers and platform providers). This platform allows them to collaboratively use data and technology to solve business challenges and use cases (e.g. seismic events detection and classification, slippery roads, fog detection). To help drive the data analytics and artificial intelligence (AI) and ML strategy and use cases, the DataLab team created an internal data science community that has representatives from the business domains and the technology department. KNMI data strategy is owned by both business and IT to create data products that help solve business problems. This is in line with the recommendation of the AWS Data-Driven Everything (D2E) program, through which AWS experts work with an organization’s business and IT groups to help shape data strategy and create data-enabled offerings that provide tangible value to the business.

In order to run their big data engineering workload and train and build ML models, KNMI needed a cloud provider that supported scaling up or down based on business needs, with purpose-built analytics services and data stores to match the variety of data they process without compromising on the performance or the cost. Jan Willem Noteboom, head of DataLab at KNMI, said, “We chose AWS because of the flexibility of configuration (size adjustments in a few clicks), breadth and depth of the analytics and AI/ML services, extensive documentation and examples, and the ease of CI/CD from development to staging to operation. All of that saved days and hours for my team and gave them additional time to focus on the differentiated work that contributes with a real value to the business.”

KNMI DataLab used AWS to create a minimum viable product (MVP) that can detect and classify seismological events, including earthquakes, acoustic bombs, nuclear activity, and more. KNMI uses ML models to detect and classify these events using seismogram time-series data coming from seismograph stations located all around the country. For the data storage component, KNMI uses Amazon Aurora to store the detection information (structured data) and Amazon S3 to store the seismogram data (object store). KNMI uses AWS Fargate to run the Docker containers to perform the ML model inferences.

The new solution will improve the automatic detection of acoustic events (e.g., explosions, mine quarry blast, sonic boom) which is not possible using the current methods. The new approach uses single station comparing with an array of stations in the current deployment. This will improve the efficiency of the detection process and open new possibilities to detect new types of events from the current and the historical archive of waveforms.

Using AWS, KNMI developed the MVP implementation quickly, benefiting from the flexibility, scalability, purpose-built data analytics services and breadth and depth of their modern data platform.

Getting started with a modern data strategy

A data-driven public sector is one that puts data to work to improve the citizen experience, by using data to drive decision making processes, eliminate data silos, and make data available for innovation to reinvent public services. A cloud-based modern data infrastructure on AWS can position organizations to adapt more quickly to changing markets and citizens needs.

Learn more about how to start your data analytics journey on AWS and unlock the value of your data by adopting a modern data strategy and architecture with these resources:

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.

Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.

AWS Public Sector Blog

Developing a modern data-driven strategy in the public sector

The five key areas of a modern data strategy

Modernizing data strategy with data architectures on AWS

How the Royal Netherlands Meteorological Institute modernized their data strategy with AWS

Getting started with a modern data strategy

Resources

Follow