AWS Storage Blog
Siemens builds Datalake2Go on AWS to analyze disparate data globally
Siemens is a technology company focused on industry, infrastructure, transport, and healthcare. From resource-efficient factories, resilient supply chains, and smart buildings and grids, to cleaner and more comfortable transportation and advanced healthcare, the company creates technology with purpose, adding real value for its customers.
Siemens technology is everywhere, supporting the critical infrastructure and vital industries that form the backbone of the global economy. It analyzes its wealth of data to generate industry insights ranging from building construction to detecting and proactively repairing rail defects to remotely monitoring the condition of its ultrasound equipment in near real time.
For a company that has 311,000 employees across 190 countries, collecting and analyzing data from disparate sources is no small feat. Siemens has to pull data from its own operations and customers, suppliers, and partners across the globe—all while managing an IT architecture.
Today, that data is funneled into what Siemens calls Datalake2Go, built on Amazon Web Services (AWS). Datalake2Go is an accessible data lake that includes a preconfigured set of data analytics and artificial intelligence (AI) tools that Siemens uses to facilitate a standardized, modern, cloud-based data infrastructure.
With AWS and AWS Partners behind the scenes of Datalake2Go, Siemens performs data analytics at scale, which the company uses to drive sustainability as it grows globally.
Building Datalake2Go using nearly 30 services on AWS
Siemens Data Cloud, the company’s umbrella for most data analytics activities, houses an enterprise-class SQL cloud data warehouse from Snowflake, an AWS Partner, and Datalake2Go. Datalake2Go launched in March 2019 and has since grown to support 45 projects with 85 AWS accounts.
Siemens has 10 PB of data in Amazon S3, an object storage service offering cutting-edge scalability, data availability, security, and performance. About 1 PB of that data is in Datalake2Go, with 4–10 TB of data used daily. Through automation and templatization, Datalake2Go reduces the operational burden of configuring AWS accounts, as well as migrating, storing and managing data in accordance with Siemens cybersecurity needs.
“We reduce the burden of creating and managing new cloud accounts because customers who use Datalake2Go get out-of-the-box connectivity, security, governance, and DevOps.”
– Maximilian Lauer, lead for cloud data analytics solutions and project manager, Siemens
Datalake2Go connects data sources through an enterprise-governed AWS account. The structured and unstructured data sources include Snowflake, Internet of Things systems, various software, and more than 100 APIs and databases connecting Siemens’ external vendors and internal business units.
Siemens connects those data stores using a variety of services, such as AWS Transfer Family, to securely scale and automate recurring business-to-business file transfers to AWS Storage services. “AWS Transfer Family is simple to incorporate in our solution with other services,” says Lauer. “It scales easily and has out-of-the-box integrations with common protocols, such as SFTP. Therefore, we can apply it to use a technology stack on AWS together with the existing systems.”
Once Datalake2Go ingests the data, it connects to two main accounts that manage the data, settings policies, read and write operations, and networking. Data is then distributed into customer or project accounts.
For a standard approach to storing data, Datalake2Go uses Amazon S3. If a relational database is needed in addition, internal teams can store their relational data using Amazon Relational Database Service (Amazon RDS), a collection of easy to manage relational databases in the cloud optimized for total cost of ownership. Siemens uses standardized templates to deploy and serve data across a variety of services to customers or applications.
To facilitate extract, load, and transform workflows, with data orchestration and data harmonization, Datalake2Go uses AWS Glue, a serverless data integration service that makes it simpler to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development, and AWS Lambda, a serverless, event-driven compute service to run code without provisioning or managing servers. Because the engineering teams understand Python, the combination of services simplifies the extract, load, transform burden.
“Everything is serverless because it scales dynamically and we can automate it better,” says Lauer.
Pairing Datalake2Go and Mendix on AWS to save one factory 15,180 hours of work
Siemens can turn the rich data in Datalake2Go into business value by building applications on top of it. To quickly build solutions that reduce costs, increase productivity, and drive innovation, Siemens uses Mendix (a Siemens company), a cloud-based application development foundation that enterprises can use to build sophisticated applications with speed, collaboration, and control.
Mendix runs on AWS and uses a low-code visual approach to software development, empowering developers of varying skill sets to build solutions, and helping nontechnical domain experts ideate and iterate rapidly. Using Mendix, companies can build mobile and web applications 10 times faster and with fewer resources.
Siemens teams have used Mendix for 20–30 use cases to quickly build applications on top of Datalake2Go so they can run their own analytics without worrying about the backend. Siemens also used Amazon SageMaker—which builds, trains, and deploys ML models for virtually any use case—to develop an ML solution within Datalake2Go called ML2Go. “ML is becoming the asset to have in data-driven companies,” says Lauer.
Siemens is expanding its AI/ML environment using Amazon Bedrock, a fully managed service that makes it easy to build and scale generative AI applications with foundation models. “In the past, the asset of data analytics was only data,” says Lauer. “It’s still our main asset, but the other is ML and now foundational models for generative AI.”
Datalake2Go and Mendix also supported a Siemens plant in Nuremberg, Germany with an application for Intelligent Document Mapping. The factory builds customized racks with electrical components. Using Amazon OpenSearch Service—an open source, distributed search and analytics suite—in combination with Amazon SageMaker ML models, engineers feed a 70-page PDF of the design into the Intelligent Document Mapping application, which then builds step-by-step instructions. First, it analyzes the design, then creates a bill of materials, electrical diagrams, and mechanical drawings needed for the project, and finally determines and maps the position of component codes on all pages of the different documents.
Before, engineers had to perform those tasks manually. The Nuremberg factory has saved 15,180 hours of work for its employees and more than €720,000 per year. Siemens plans to adopt this solution across its factories.
Using AWS to scale Datalake2Go globally
Siemens plans to optimize and expand its data analytics by enhancing its ML capabilities using Amazon SageMaker and ML operations concepts. Siemens is building a containerized-based environment that’s highly connected to Datalake2Go using Amazon Elastic Kubernetes Service (Amazon EKS), a managed Kubernetes service.
On AWS, Datalake2Go can scale as Siemens grows its business globally while operating sustainably. Siemens is committed to operating at net zero by 2030 and expects to achieve a 55 percent reduction in emissions by 2025 compared to 2019. By 2025, it will use its highly connected environment on AWS to generate key performance indicators to track and advance sustainability progress in its Sustainability Data Cloud, hosted in both Snowflake and Datalake2Go.
“We will have to bring lots of savings and analytics within the next few years to reach our goals, and that’s a huge undertaking,” says Lauer. “But we will do it on AWS.”