ENGIE Builds the Common Data Hub on AWS, Accelerates Zero-Carbon Transition
ENGIE—one of the largest utility providers in France and a global player in the zero-carbon energy transition—produces, transports, and deals electricity, gas, and energy services. With 160,000 employees worldwide, ENGIE is a decentralized organization and operates 25 business units with a high level of delegation and empowerment. ENGIE’s decentralized global customer base had accumulated lots of data, and it required a smarter, unique approach and solution to align its initiatives and provide data that is ingestible, organizable, governable, sharable, and actionable across its global business units.
In 2018, ENGIE decided to accelerate its digital transformation through data and innovation by becoming a data-driven company. First, ENGIE wanted to build an enterprise data repository named the Common Data Hub to align its customers and business units around the same solution. The Common Data Hub helped ENGIE’s business units easily ingest, store, share, and consume datasets through a unified platform and highly secured environment, ultimately enabling the company to increase productivity, make accurate energy-production predictions, and bring new services to customers.
ENGIE used Amazon Web Services (AWS) to create the Common Data Hub, a custom solution built using a globally distributed data lake and analytics solutions on AWS. The Common Data Hub empowers teams to innovate by simplifying data access and delivering a comprehensive set of analytics tools. AWS Professional Services supported ENGIE in designing and implementing the solution and putting in place an internal service team (called the data@ENGIE team) that is in charge of evolving and operating the Common Data Hub platform.
Identifying a Need for Smarter Data
ENGIE’s customers vary widely, from cities to retail customers to large companies and beyond. The company increasingly supports customers’ ability to generate their own electricity with decentralized assets, including solar panels and wind farms. As ENGIE trended further toward decentralization, it found that its Systems, Applications, and Products in Data Processing enterprise resource planning software needed updating. The company needed a uniform method of collecting and analyzing data to help customers manage their value chains. “We need to use data in order to measure consumption and anticipate the volumes of electricity production depending on the weather forecast, for instance,” says Gregory Wolowiec, the technology team leader who guides the development and delivery of ENGIE’s data programs. Wolowiec also cites problems of isolation and inconsistencies across countries: “All the solutions were different from one country to another; there was no sharing between the different parts of the organization. It became very important for us to be able to collect and share the data in a streamlined way everywhere on the planet.”
Yves Le Gélard, chief digital officer at ENGIE, explains the company’s purpose: “Sustainability for ENGIE is the alpha and the omega of everything. This is our raison d’être. We help large corporations and the biggest cities on earth in their attempts to transition to zero carbon as quickly as possible because it is actually the number one question for humanity today.”
ENGIE’s group chief data officer, Gérard Guinamand, adds, “Our strategy when it comes to data is actually directly linked to our purpose. If you want to drive and execute on a zero-carbon transition, you need first to gather data on what’s happening. That includes data on how much carbon dioxide you burn, where you burn it, and how it all correlates with environmental topics like the weather, the temperature, and the number of people. All this data needs to be stored, gathered, and computed so that you can measure progress and follow a road map.”
Whatever method ENGIE adopted needed to have a high level of security and be compliant with regulations all over the world. As the company put together a proof of concept, it explored a variety of solutions with local and global cloud providers. “We were convinced that AWS was a good solution for many reasons, including the cost model—and especially in terms of data storage,” says Wolowiec. So ENGIE began undertaking its large data project on AWS in mid-2018.
Developing the Common Data Hub and Deploying It around the World
Wolowiec describes the Common Data Hub as “the collaborative and distributed data lake that enables ENGIE to store data, share data, and create value with data.” It was built using Amazon Simple Storage Service (Amazon S3), an object storage service that offers industry-leading scalability, data availability, security, and performance. The solution also uses Amazon Redshift, a fully managed, petabyte-scale data warehouse service in the cloud that can query semistructured data in the Amazon S3 data lake, demonstrating the lake house approach to data warehousing.
Because the solution uses Amazon Redshift, customers can deploy data warehouses securely in their Common Data Hub environments and make use of analytics. The company’s business unit administrators, by managing Amazon Redshift clusters on the Common Data Hub, can be added to Common Data Hub projects and easily access datasets in the Amazon S3 data lake, as well as build valuable insights from the Common Data Hub’s rich datasets catalog. The Common Data Hub uses Amazon Redshift for two different types of data access. It uses Amazon Redshift Spectrum for direct query to the Common Data Hub’s S3 buckets and uses Amazon Redshift as a provisioned data warehouse with its own internal storage. “We rely on the Amazon Redshift Spectrum feature to make the link between the Amazon S3 data lake managed by the Common Data Hub and the Amazon Redshift data warehouse,” says Wolowiec.
Other AWS services involved in the Common Data Hub include Amazon Kinesis Data Streams (Amazon KDS), a massively scalable and durable real-time data streaming service. Using Amazon KDS enables ENGIE to easily collect, process, and analyze streaming data from Internet of Things devices in real time, meaning ENGIE can quickly gather the information it uses to develop insights. AWS Glue—a fully managed extract, load, transform (ETL) service for a metadata repository—further helps transfer and clean the data. Amazon Athena, an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL, lets ENGIE business units view data. And to glean further insights from data, ENGIE relies on Amazon SageMaker, a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly.
To facilitate smooth and easy adoption of the Common Data Hub all around the world, ENGIE provided acceleration templates and documentation to help its business unit administrators see the value of the data they collect and access the data in the distributed data lake. The Common Data Hub also enables high levels of data governance and security. Data producers can share and control access to datasets and workflows, and consumers can request access to and consume data.
The integration of AWS services provided a secure, agile, and scalable solution for ENGIE. Now different business units can use the framework in the ways they need without sacrificing anything essential to operations. Ease of use and automation let ENGIE business units increase productivity quickly after building the Common Data Hub solution on AWS. There are also positive environmental impacts: ENGIE uses data to obtain the maximum amount of energy possible from wind farms, helping increase the efficacy of this important renewable energy source. “We provide the right tools so that entities can focus on value creation instead of taking time to deal with technical issues,” says Wolowiec. As of July 2020, ENGIE had collected 95 TB of data set up in the Common Data Hub.
Facilitating a Standardized Top-Down Approach
The Common Data Hub forms the backbone of ENGIE’s data-driven strategy by enabling data community between information technology and business users, accelerating increased data literacy at every level of ENGIE and helping optimize internal processes or create new data-driven services. All business units are now empowered with a unique solution to build data-driven applications faster. ENGIE currently has more than 351 projects set up on the Common Data Hub going on across the world. The Common Data Hub offers a truly cohesive solution since it eliminates silos and enables every department to benefit from equal access to the common framework.
With its new method of collecting and sharing data, ENGIE sees an opportunity to change how it does business—and the company is in the process of building a vertical data hub to do just that. ENGIE has historically had a bottom-up approach, with its business units providing services for customers in their respective regions. However, since many of its energy services are the same, this results in an unnecessary duplication of work. “Our electricity generation activity and especially our renewable energy generation are basically the same everywhere,” says Wolowiec, “We can use the Common Data Hub to build common use cases around the world. Next for us is to introduce more and more top-down approaches—especially for wind farms.”
ENGIE discovered significant value by using AWS services to build its Common Data Hub, enabling its global business units to collect, share, and analyze data in more productive ways. ENGIE’s business units still retain autonomy, but they can now benefit from the strengths of centralized data, garnering important insights from similar use cases as they discover newer, more efficient ways to power the world.
ENGIE is a global energy company with 25 business units operating worldwide. The company powers millions of customers and develops integrated solutions throughout the value chain to support corporations’ and local authorities’ zero-carbon transition.
Benefits of AWS
- Collected 95 TB of data across 351 projects
- Automated energy predictions
- Increased business unit productivity
- Maximized wind farm energy production
AWS Services Used
Amazon Kinesis Data Streams (KDS)
Amazon Kinesis Data Streams (KDS) is a massively scalable and durable real-time data streaming service. KDS can continuously capture gigabytes of data per second from hundreds of thousands of sources. The data collected is available in milliseconds to enable real-time analytics use cases such as real-time dashboards, real-time anomaly detection, dynamic pricing, and more.
Amazon Redshift is the world’s fastest cloud data warehouse and gets faster every year. Redshift powers analytical workloads for Fortune 500 companies, startups, and everything in between.
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all of the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.