France’s National Railway Reduces Costs, Increases Productivity on AWS
France’s state-owned railway service, Société Nationale des Chemins de Fer Français (SNCF), requires sophisticated technology to manage and maintain safety along its 32,000-km rail network. In 2017 SNCF Réseau, the subsidiary that maintains and manages the SNCF rail infrastructure, set out to create a computer vision solution that could use images captured from train cameras to help the company identify potential rail malfunctions and anticipate maintenance needs. However, SNCF Réseau’s legacy data centers lacked the agility and throughput the company sought and were becoming obsolete and expensive to maintain. Although SNCF Réseau had access to a vast amount of data, it was largely segregated and unsuitable for the analysis necessary to facilitate the machine learning (ML) solution that the company envisioned.
Seeking to modernize its tech infrastructure, SNCF Réseau used Olexya, an Amazon Web Services (AWS) Select Consulting Partner, to migrate workloads to AWS. The broad migration involved shifting SNCF Réseau’s ML framework to Amazon SageMaker, the fully managed service that helps data scientists and developers prepare, build, train, and deploy high-quality ML models quickly by bringing together a broad set of capabilities purpose-built for ML. On AWS, the team ultimately reduced the model training time from 3 days to 10 hours. Now in the cloud, SNCF Réseau is poised to realize ML- and Smart Data–driven predictive maintenance and unleash the potential of ML for many more initiatives companywide.
Amazon SageMaker and Spot Instances were instrumental in simplifying and accelerating the deployment of AI/ML algorithms.”
Head Manager of Geographic and Analytic Data
Modernizing and Standardizing in the Cloud
SNCF Réseau is the result of the 2015 reunification of the French rail network (known in France as Réseau Ferré de France, or RFF) and SNCF, which had separate information systems until 2015. After the reunification, a lack of standardization hampered exchanges with managers in neighboring countries. Dedicated to maintenance, modernization, and security, SNCF Réseau has long sought to standardize data and maximize its usefulness within the company and for its European partners. So the company decided to redesign much of its legacy infrastructure on AWS, starting in 2019. SNCF Réseau determined that the breadth of AWS managed services met the company’s expectations for accelerating the implementation of its Smart Data strategy—a sweeping new approach to collecting and quickly analyzing data relevant to the maintenance of SNCF rails using smart sensors in near real time.
One of the first steps involved migrating data to a data lake in Amazon Simple Storage Service (Amazon S3), an object storage service that offers industry-leading scalability, data availability, security, and performance. Looking to avoid the challenges associated with segregated data and nonstandard definitions, the company developed Ariane Model, a unified modeling language, to align definitions of rails, traffic, maintenance, and other key elements, in effect normalizing the data after collection. Based on RailTopoModel, a systemic model used by several European organizations, Ariane represented an important step toward regional standardization and regulatory and compliance reporting.
Clear, standard definitions helped SNCF Réseau create a “clean” data lake in Amazon S3. Instead of pouring raw data into a data lake and applying a layer of intelligence to make sense of it, the company developed the means to define objects relevant to particular use cases, including ML, before feeding them into the data lake. “Contrary to a data swamp, a clean data lake provides the level of confidence in data that managers need to formulate a strategy and make the right decisions,” says Samuel Descroix, head manager of SNCF Réseau’s geographic and analytic data department. From there, data scientists could query data using Amazon Athena, an interactive query service that makes it simple to analyze data in Amazon S3 using standard structured query language.
Streamlining ML on AWS while Saving Costs
A key element of SNCF Réseau’s modernization is its computer vision model, designed to identify malfunctions or problems on rail lines and better anticipate maintenance needs. In 2017 the company had begun developing algorithms for artificial intelligence (AI) / ML in Python 2.7 with the Caffe2 deep-learning framework, but its on-premises data centers lacked the agility and throughput needed to enable effective ML, with models taking as long as 3 days to train. Having migrated to AWS and having established a clean data lake in Amazon S3, the company saw an opportunity to take advantage of the suite of services on AWS by adapting its Caffe2 framework to be trained on and deployed from AWS managed environments, including Amazon SageMaker.
In March 2020, the company deployed the code on its new system on AWS after only 2 weeks of tuning. With its framework now on AWS, SNCF Réseau data scientists could enjoy a high degree of autonomy for ML use cases, with access to the right tools at the right times. “Many tasks that had been complex for data scientists are greatly simplified on Amazon SageMaker,” says Descroix. Buoyed by this relative simplicity, training jobs dropped from 3 days on the old system to just 10 hours on the new system, a reduction of nearly 90 percent.
What’s more, the company was also able to optimize costs by using Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances, which helped SNCF Réseau take advantage of unused Amazon EC2 capacity on AWS at a deep discount. Considering that the company’s ML workloads are compute intensive but not time sensitive, using Spot Instances through Managed Spot Training in Amazon SageMaker proved valuable, ultimately saving the team 71 percent in data science costs compared to Amazon EC2 On-Demand Instances. “Amazon SageMaker and Spot Instances were instrumental in simplifying and accelerating the deployment of AI/ML algorithms,” says Descroix.
Olexya was able to support SNCF Réseau in setting up a DevOps team for the AWS deployment, helping reduce project delivery times from 3 months with the legacy system to less than 48 hours on the new system. Work is underway to further reduce these deadlines and completely automate deployment without human interactions—a key part of which involves Amazon Elastic Kubernetes Service (Amazon EKS), which helps users provide highly available and secure clusters and automates key tasks such as patching, node provisioning, and updates. Another important benefit of building on AWS involved the time frame for setting up infrastructure—including spinning up data-trained models with the necessary resources provisioned—which dropped from 3–6 months to just 1 week.
Expanding ML Potential Companywide
As of January 2021, SNCF Réseau is in the process of optimizing its predictive maintenance algorithm for production and determining the appropriate hardware for its coaches. Beyond predictive maintenance, the company expects to pursue many more ML-driven initiatives. The company’s initial success has helped the company develop what it calls a “BYOA” (bring your own algorithm) strategy, involving Spot Instances, Amazon SageMaker, and Amazon EKS.
Now operating largely on AWS, the company can iterate more quickly than before—the ML equivalent of switching from traditional rail to high-speed rail. The team has even decided to expand the scope of the solution to include geomapping features that will help with key decisions and further facilitate asset maintenance.
About SNCF Réseau
SNCF Réseau is a subsidiary of Société Nationale des Chemins de Fer Français (SNCF)—France’s national railway company. SNCF Réseau operates and manages the SNCF rail network infrastructure, which consists of about 32,000 km of rail lines.
Benefits of AWS
- Reduced AI/ML deployment time from 3–6 months to 1 week
- Deployed code on new ML system within 2 weeks
- Reduced model training time from 3 days to 10 hours
- Reduced project delivery times from 3 months to less than 48 hours
- Reduced data science costs by 71%
AWS Services Used
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.
Amazon EC2 Spot Instances
Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices.
Amazon SageMaker helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.
Amazon Elastic Kubernetes Service (Amazon EKS) gives you the flexibility to start, run, and scale Kubernetes applications in the AWS cloud or on-premises.
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.