AWS Public Sector Blog

Driving public sector innovation: Building a data lakehouse and analytics platform on AWS for public housing

AWS branded background with text "Driving public sector innovation: Building a data lakehouse and analytics platform on AWS for public housing"

This post was written by Amazon Web Services (AWS) in collaboration with Timothy Peh from the Housing & Development Board (HDB). In this post, we explore how HDB worked with AWS Professional Services to design and build the ARK data lakehouse on AWS, detailing the data strategy, platform architecture, and the benefits realized.


In Singapore, public sector agencies are transforming how they use technology and data to deliver better services, respond to citizens’ needs more quickly, and plan for the future. As part of this digital innovation journey, agencies are embracing cloud adoption, advanced analytics, and integrated data platforms to enhance policymaking and service delivery.

Key to this effort is improving data accessibility for public service officers, reducing the time to collect and analyze data (often referred to as “time to data”), and equipping them with self-service analytics tools to transform real-world data into actionable insights.

Achieving these outcomes requires robust, scalable, and centralized data infrastructures capable of handling increasingly complex and growing volumes of information.

How data shapes Singapore’s public housing

One agency embracing this journey for digital innovation is the Housing & Development Board (HDB).

As Singapore’s public housing authority, HDB plans and develops homes and towns for nearly 80 percent of the nation’s resident population, creating inclusive, livable, and sustainable living environments. Data plays a vital role in fulfilling this mission, informing how HDB designs and builds homes that meet residents’ evolving needs and enhances their daily living experience.

HDB’s data strategy rests on four pillars that build a modern, agile, and user-centric data ecosystem. These four pillars empower HDB officers to transform data into actionable insights in their day-to-day operations, drive smarter planning, and deliver better outcomes for residents:

  • DataOps culture – Orchestrating data pipelines through automation, amplifying feedback loops through observability, and enabling quick experimentation through a secure sandbox environment and continuous integration and continuous delivery (CI/CD) pipelines.
  • Serverless and microservice architecture – Reducing operational overheads while improving productivity, scalability, flexibility, and agility in delivering data products.
  • Federated data mesh operating model – Decentralizing data stewardship by business domains through a centralized data platform and engineering capabilities and federated governance guardrails.
  • Data-as-a-service business model – Delivering data seamlessly to multiple touchpoints through well-established solution patterns.

Through this strategy, HDB launched the Analytical Repository of Knowledge (ARK) initiative—a decisive step to strengthen data management and analytics across the organization. ARK powers agile, insight-driven decision-making, enabling HDB to respond more effectively to residents’ evolving needs and enhance the quality of its housing and town planning efforts.

Platform architecture

The HDB ARK platform employs a modular architecture, using AWS Cloud services to create distinct but interconnected building blocks. This design delivers a robust foundation for data management, processing, and governance. The key components include:

  • Foundational cloud infrastructure – The foundational layer of the ARK platform was designed with a security-first approach, using core AWS services for compute, storage, and networking. This managed infrastructure provides inherently secure and scalable compute, storage, and networking capabilities—creating a controlled environment from the ground up and establishing a secure baseline that allows HDB teams to focus on data-level security and application logic rather than underlying hardware management.
  • Data management and processing core – This block encompasses the essential capabilities for handling the data lifecycle within the ARK platform. It includes automated pipelines for data onboarding from various sources, tools for data transformation and enrichment, versatile storage options supporting different data types, and the framework for creating and managing curated analytical data products.
  • Unified access and governance layer – This component provides a consistent interface for users to interact with the platform’s data assets. It integrates functionalities like a central data catalog, powered by comprehensive metadata management, for data discovery; standardized workflows for requesting data access; and mechanisms for enforcing data governance policies and standards across the platform. Many of these capabilities are surfaced to end users through the user-friendly ARK portal interface.

 

Figure 1: Conceptual layers of the HDB ARK platform

High-level architecture

To bring the ARK data lakehouse vision to life, HDB used a suite of AWS managed services, enabling the creation of a scalable, resilient, and secure platform while minimizing infrastructure management. The collaboration with AWS ProServe was instrumental in architecting the solution using the following key services, grouped by function (as shown in the following figure):

  • Data ingestion and preparation
    • AWS Transfer Family – Provides secure mechanisms for file transfers into the data lakehouse.
    • AWS Database Migration Service (AWS DMS) – Facilitates efficient migration of data from various source databases.
    • AWS Glue – Used for its serverless extract, transform, and load (ETL) capabilities to process and transform incoming data, and AWS Glue Data Catalog serves as the central metadata repository.
    • AWS Step Functions – Orchestrates complex ETL workflows and data processing pipelines.
    • Amazon EventBridge – Manages event-driven automation and scheduling of data pipelines.
    • Amazon Simple Storage Service (Amazon S3) – Acts as the core storage layer for the data lakehouse, holding raw, curated, and transformed data. It supports various storage formats, including open table formats suitable for large-scale analytics.
    • Amazon DynamoDB – A NoSQL database used for operational metadata and tracking within data pipelines.
  • Data access control
    • AWS Lake Formation – Enables fine-grained access control and governance policies over data stored in Amazon S3 and cataloged in AWS Glue.
    • Amazon Athena – Allows the running of interactive SQL queries directly against data stored in Amazon S3.
  • Observability and security
    • Amazon CloudWatch – Provides monitoring, logging, and alerting for platform resources and applications.
    • AWS CloudTrail – Records API calls for auditing, governance, and compliance purposes.
    • AWS Secrets Manager – Securely manages database credentials and other secrets required by the platform.
  • Supporting platform components

 

Figure 2: High-level architecture using AWS services

Platform strategy and benefits

The ARK platform strategy is centered on delivering a robust and user-centric data environment.

A comprehensive data cataloging system is central to this, making it easier to discover and understand data assets through effective metadata management. Alongside this, the platform integrates a security-first design with built-in guardrails, supporting data protection and adherence to compliance mandates. Robust governance for data quality, security, and compliance is also an integral part of the platform, which is essential for reliably scaling data products.

Beyond these foundational elements, ARK emphasizes self-service enablement. This gives HDB’s domain teams the tools to independently create, manage, and deploy data products, thereby accelerating their impact. The design also prioritizes seamless integration, fostering interoperability across diverse data sources and systems within HDB’s ecosystem. Enhanced collaboration is further facilitated by mechanisms for effective sharing of data and metadata across teams. Finally, the platform benefits significantly from extensive automation and DevOps. This involves implementing infrastructure as code (IaC) and automated CI/CD pipelines (handling ETL scripts, AWS Glue jobs, and so on), allowing HDB’s data engineers to deploy updates rapidly and consistently while reducing manual effort and operational risk.

Currently in its first phase, the ARK platform is already demonstrating significant technical benefits, including highly scalable storage, end-to-end pipeline automation, and reduced infrastructure overhead. Crucially, these advancements accelerated the time to data by 80 percent, which meant HDB analysts and planners could more rapidly take advantage of consolidated data. Consequently, they could derive valuable insights for data-driven decisions, understand evolving resident needs, and analyze housing demand trends. HDB aims to continue to scale the platform to support more users across agencies on their data needs.

“With the support of AWS Professional Services, HDB took a major leap forward in our data journey with the development of ARK, boosting our capacity to harness data while significantly reducing our time to insight. The solution was built with security and scalability at its core, greatly enhancing how HDB leverages data to inform both policy and operational decisions,” said Tay Jun Jie, HDB’s deputy director of Data Management & Data Science.

Conclusion

The ARK data lakehouse, built on AWS, marks a significant step in HDB’s digital transformation journey. Through collaboration with AWS Professional Services and the use of managed services, HDB has created a modern, scalable, and secure platform with enhanced efficiency. By implementing a governed data lakehouse founded on a DataOps culture and a federated data mesh model, HDB is advancing its vision of a modern, data-driven organization equipped for the future. The platform now empowers its officers with timely, trustworthy data to enhance public service delivery for the nation. Looking ahead, HDB’s vision is to scale the ARK platform, incorporating advanced analytics and generative AI capabilities to derive deeper insights and further innovate.

To learn more and get started, contact your AWS account team or the AWS Public Sector team.

Timothy Peh

Timothy Peh

Timothy is the tech lead for the Data Engineering & Platform team within HDB's Data Science & Artificial Intelligence Centre of Excellence (DSAI COE). A strong advocate of data-driven decision-making, Timothy leads the cloud migration and transformation journey for HDB's data and analytics landscape to empower business users to harness data effectively in pursuit of organizational goals.

Yun Yi Lim

Yun Yi Lim

Yun Yi is a delivery consultant at AWS Professional Services, where she helps customers solve their business problems using AWS. She focuses on enterprise architectures and applications, especially for communication applications and public sector systems.