AWS for Industries
Using serverless architecture for efficient SMETS 2 data ingestion and processing
In the dynamic landscape of energy consumption, the ability to monitor and manage energy usage has become a critical need for consumers. At the heart of this evolution are Smart Metering Equipment Technical Specifications (SMETS) 2 meters. These sophisticated meters represent the second and more advanced generation of smart meters in the United Kingdom. Unlike their predecessors (SMETS 1 meters), SMETS 2 meters come with enhanced capabilities and benefits that are designed to improve the efficiency, reliability, and accessibility of energy consumption data for both consumers and energy suppliers.
However, using the full potential of these smart meters comes with its challenges. Despite having a considerable portfolio of SMETS 2 meters across a customer base, the sheer volume of data that these devices can generate presents a significant obstacle for suppliers like Centrica Business Solutions (CBS). The company’s on-premises infrastructure was initially unable to handle the massive influx of data, primarily because activating the full data flow meant handling hundreds of thousands of XML messages daily. This challenge wasn’t just a technical hurdle; it was a compliance issue as well. The Department for Business, Energy, and Industrial Strategy (BEIS) mandate requires energy suppliers to make energy usage data accessible to their non-domestic customers, including a year’s worth of historic data from their smart meters in a format that customers can easily access. This information, drawn from detailed half-hourly (for electricity) or hourly (for gas) readings, is crucial for providing customers with the insights they need to make informed choices about their energy use.
Facing the dual challenges of technological limitations and regulatory requirements, CBS embarked on a journey to transform its data handling capabilities using Amazon Web Services (AWS). This endeavor was not just about meeting a mandate but about unlocking the full potential of smart meter technology to enhance energy management for customers. Join us as we delve into the steps that CBS took to overcome these challenges, the solution it implemented, and the impact this has had on its customers’ ability to understand and control their energy consumption.
Centrica Business Solutions (CBS)
Centrica Business Solutions (CBS) is part of Centrica PLC, a global energy and services company dedicated to providing innovative solutions to meet the changing needs of its customers. CBS focuses on helping businesses and organizations around the world improve their energy use and manage their energy infrastructure more efficiently and sustainably.
Architecture
Figure 1. CBS architecture on AWS, ingesting and processing SMETS2 data
The data generated by SMETS 2 meters is voluminous and complex, which requires an efficient approach to ingest, process, and transform the data. Using the scalability and flexibility of serverless computing on AWS, CBS has engineered a robust pipeline that provides seamless processing of XML files into Parquet format for advanced analytics.
This meter data is collected by the Data Communications Company (DCC) in the United Kingdom through a wide area network (WAN).
The DCC assembles the meter data into files and exposes an API for authorized parties like CBS to retrieve the data.
CBS periodically polls the DCC API to check for any new meter data that is available. Once new data is available, it is retrieved through the API in a standard XML format.
Data ingestion: Harnessing the power of serverless compute
The SMETS 2 data arrives continually from the systems that collect data from the DCC APIs and is deposited into an inbound bucket in Amazon Simple Storage Service (Amazon S3), which is an object storage built to retrieve virtually any amount of data from anywhere.
Data filtering: Efficient parallel processing
Because not all the incoming data is needed, an Amazon EventBridge Scheduler—a pay-per-invoke task and event scheduler that triggers based on parameters you define—triggers a state machine in AWS Step Functions—a visual workflow service—every 24 hours to first filter the files. Using the AWS Step Functions Distributed Map state, which facilitates high-concurrency processing, files are filtered in parallel to identify those that require further processing. These files are moved to a designated Amazon S3 bucket for subsequent stages, while a lifecycle rule archives the remaining files in the inbound bucket, optimizing storage management and accessibility.
Data transformation
The heart of CBS’s architecture lies in the data transformation phase. AWS Glue, a serverless data integration service, converts XML files into Parquet format. Using the Databricks XML library, the service effortlessly comprehends input files, automatically inferring schema for seamless integration. The AWS Glue job is triggered within the state machine after the incoming data has been filtered. As data is massaged and transformed, the results are stored in the outbound Amazon S3 bucket, ready for advanced analytics.
To streamline metadata management, an AWS Glue crawler is triggered post-transformation, updating partitions and enriching the AWS Glue Data Catalog, which is an index to the location, schema, and runtime metrics of your data. This provides accessibility and discoverability of data, empowering analysts with comprehensive insights into SMETS 2 datasets. The crawler creates AWS Native Delta tables, which are used to perform time traveling of data and support atomicity, consistency, isolation, and durability (ACID) transactions. AWS Native Delta tables increase performance efficiency compared to open-source Delta implementations using manifest files, and because the tables are registered as Delta tables in the AWS Glue Catalog, they can be referenced using any engine of choice. This also saves time by removing the need to read the tables with a Delta path through Apache Spark and the need to then register the tables as temporary views to downstream processing.
Conclusion
By adopting serverless architectures, companies like CBS can efficiently manage the amount of data from SMETS 2 smart meters, facilitating rapid, cost-effective insights. This approach not only streamlines operations but also helps businesses to use their data in near real time, driving innovation and advancing the clean energy transition. As the volume and complexity of data continue to grow, CBS’s approach paves the way for future advancements in data management and analysis.