AWS Partner Network (APN) Blog

Gaining Operational Insights of the Australian Census with AWS

By Ali Khoshkbar, Cloud Architect – AWS
By Aaron Brown, Principal Engineer – Shine Solutions
By James Ireland, Account Executive – AWS

Shine-AWS-Partners
Shine Solutions
Connect with Shine-1

In early August, millions of people took part in the 2021 Census across Australia, providing a comprehensive picture of the country’s economic, social, and cultural makeup.

The Australian Bureau of Statistics (ABS) ran the 2021 Census on Amazon Web Services (AWS), and insights into the performance and uptake of the Census were provided to the ABS.

Using an operational insights (OI) platform based on the Serverless Data Lake Framework (SDLF) provided business intelligence that helped inform the 2021 Census operations across the country.

The ABS received a large amount of traffic, with more than 9.6 million forms in total, of which 7.6 million were submitted online. The ABS needed to get high quality data insights to understand and respond to challenges affecting Census completion across the country, so they could allocate resources most efficiently.

This included gathering detailed information from field staff on details of the dwellings being visited, from specific follow-up instructions to safety hazards. Sharing this type of information between staff and managers provided clear insights into the conduct of the Census field work and minimized re-work for enumeration staff.

The OI platform—built in partnership with AWS Professional Services, Shine Solutions, ARQ Group, and the ABS—achieved this goal of providing near real-time insights into a very complex logistical activity.

In this post, we discuss the high-level architecture of the OI platform, and how it was able to pull data from multiple disparate sources and serve it to PowerBI and a single-page application (SPA).

We’ll dive into a specific aspect of the architecture, demonstrating a pattern of executing SQL in Amazon Redshift and using Amazon Aurora as a fast-caching layer to improve user experience and reduce cost.

Finally, we’ll discuss how AWS, Shine, and the ABS worked together to design, build, and operate a well-architected system that delivered business value for a workload of national importance.

Architecture

The operational insights platform needed to pull data from various sources, including the  Census’ frontend website and serverless backend, ABS’ on-premises Oracle databases, an Amazon Connect-based automated call center, AWS security services, and more. It also needed to scale to handle large volumes of data.

There was a real possibility of a high-scale distributed denial of service (DDoS) incident on the Census, so the OI platform had to be able to ingest data generated from hundreds of thousands of connections per second. A serverless architecture was a natural fit for these requirements.

There are three main layers in a typical data lake: the ingestion layer, the storage and transformation layer, and the serving layer.

Shine-Australian-Census-1

Figure 1 – The operational insights platform’s high-level architecture.

The AWS Database Migration Service (AWS DMS) was used to pull data from ABS’ on-premises Oracle databases and file systems for the ingestion layer. AWS Lambda was used to pull data from the Census Digital Service (CDS) and other sources. These services ingested data into Amazon Simple Storage Service (Amazon S3).

This is where services in the storage and transformation layer took over. AWS Step Functions was used as a workflow service to orchestrate Lambda functions which performed light transformations on the data. This included transformations like file format conversions (csv to Apache Parquet, for example), schema validation, and partitioning.

From here, heavier transformations, such as change data capture (CDC) processing using AWS Glue or complex business use cases were executed in SQL on an Amazon Redshift cluster. Using Amazon Redshift, AWS’ data warehouse service, these transformations could be run quickly and cost effectively.

For the serving layer, the OI platform needed to support many concurrent users, offer millisecond response times, and avoid a per-query pricing model to keep costs controlled. The decision was made to use a combination of Amazon Aurora and Amazon Redshift to achieve these three goals.

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud and would use materialized views of the Amazon Redshift table to provide fast response times of less than 100ms.

Amazon Redshift would provide a significantly lower response time for complex queries that ran frequently or on demand on billions of records. These two services were configured within a PowerBI Gateway instance to allow PowerBI to access data from them. Data ran over AWS Direct Connect to ABS’ PowerBI instance, ensuring a high bandwidth throughput and consistent network experience.

A private react application (SPA) was built to provide OI administrators and users visibility on the OI platform operations itself. This application supports functionalities such as data lineage, transformation execution status and history, and some handy features like one-click datasets reloading to allow users to interact with the OI platform in an easy and transparent way.

Shine-Australian-Census-2

Figure 2 – Example scheduled transformation dependent on two source transformations.

Amazon Redshift + Amazon Aurora

One of the patterns implemented by the OI platform was the use of Amazon Aurora as a caching layer for Amazon Redshift. Queries were executed in Amazon Redshift, then the results were stored in an Aurora instance.

In this pattern, transformations get executed on schedules in Amazon Redshift (every 20 minutes, for example) and the results are materialized and saved in Aurora Postgres using the dblink extension.

By making these results available in Aurora, performance improved significantly, with average access latency reduced from 2+ seconds on Amazon Redshift to ~100ms on Aurora.

Scalability is another important benefit of using this pattern. The OI platform can easily adjust to handle spike in the number of dashboard users by adding new read replicas in Aurora in just a few seconds.

Working in Partnership

The development of the OI platform demanded a fast pace, with business and development teams working together closely to deliver the right outcomes.

With involvement from four different organizations—ABS, AWS, Shine Solutions, and ARQ Group—this could have been very complicated. However, the four groups came together to make sure the result was effective, user friendly, and well-architected.

The team divided the work such that AWS and Shine built the underlying platform, ingestion processes, ordered SQL processing engine, data integrity checks, and developer tooling for the SQL transformations. Meanwhile, ARQ supported the development of the initial PowerBI visualizations.

This allowed the ABS to focus on what they do best—understanding and running the Census.

Having ABS developers focus on the transformations resulted in a deep understanding of the use cases from the wider ABS organization. This ensured the transformations and reports provided in-depth analysis and delivered value across the board, from the executive to operational users.

Daily combined stand-ups facilitated close collaboration, as everyone involved was able to understand the ABS requirements and any pain points. This allowed the project to pivot quickly and adapt to changing use cases.

One such example was the implementation of additional ingestion, analysis, and reports based on the logs of the CDS. These were added late in the project to address a new reporting and monitoring requirement and accomplished due to the platform’s extensibility and close collaboration of AWS, Shine, and the ABS.

Outcomes

Using the operational insights platform, ABS business owners were able to get near real-time insights into metrics about the Census:

  • Form submissions
  • Uptake of the Census across the country
  • Even the comparison of online forms vs. paper forms broken down by local area

In 2016, these metrics were made available to business owners once every 24 hours. They were able to get these at a much higher frequency than previous Census events.

Using the OI platform, that frequency increased to once every 20 minutes, with more frequent updates possible but not required. This allowed the Australian Bureau of Statistics (ABS), an intensely data-driven organization, to make more accurate decisions at a much higher rate.

To learn more about running your data lake on AWS, visit the Analytics on AWS website.

.
Shine-APN-Blog-CTA-1
.


Shine Solutions – AWS Partner Spotlight

Shine Solutions is an AWS Public Sector Partner that provides leading-edge AWS innovation for large enterprises and government.

Contact Shine Solutions | Partner Overview

*Already worked with Shine Solutions? Rate the Partner

*To review an AWS Partner, you must be a customer that has worked with them directly on a project.