Running critical workloads with Amazon EKS and AWS Fargate at Generali Italia
This blog was co-authored by Matteo Generali, Head of Digital Factory – Generali Italia; Andrea Caligaris, Claims & Health applications development lead – Generali Italia; Lorenzo Micheli, Senior Cloud Infrastructure Architect – AWS Professional Services; and Ettore Trevisiol, Cloud infrastructure Architect – AWS Professional Services.
Who is Generali Italia?
Generali Italia is one of Europe’s largest insurers with gross written premiums totaling more than 24.6 billion euros and a nationwide network of 40,000 distributors, as well as online and bancassurance channels, 13,000 employees, and 120 billion euros of assets under management. The Generali Italia group includes Alleanza Assicurazioni, Cattolica Assicurazioni, Das, Genagricola, Genertel and Genertellife, and Generali Welion and Generali Jeniot.
In 2019, we began an ambitious IT transformation program aimed at redefining our technology culture and redesigning our core systems. The cloud has been a fundamental pillar of the program, allowing our teams to reap benefits such as greater autonomy, unparalleled architectural flexibility, and quick time to market.
For the most critical insurance processes, Generali Italia decided to partner with AWS to embrace the transformation journey.
Use case – claims management
The claims department was identified as the one that could benefit the most from the digital transformation. Claims is one of the few touchpoint processes with final customers, and it is strictly interconnected with a large ecosystem of players: garage shops, external experts, lawyers, doctors, third-party managers for special coverages, and so on.
As part of the IT transformation program, Generali Italia started a complete review of the claims processes with the goal of improving customer experience and supporting claims handler operators by unifying applications and introducing advanced support tooling.
During this review process, Generali Italia saw the opportunity to raise the bar by:
- Expanding the DevOps team’s agility to improve feature delivery time
- Increasing workload availability and reliability
- Reducing infrastructure costs and maintenance effort
- Having an up-to-date environment with the latest security features
In order to implement these requirements, the new Claims System was planned and built using a microservice architecture with a front end that is adaptable to many formats and devices.
The claims backend consists of several Java Spring Boot microservices; therefore, containers were a perfect fit.
Initially, we deployed them on premises on a self-managed Docker Swarm cluster. As the overall IT transformation of Generali Italia was progressing, it became clear that the market was shifting towards different technologies, and Swarm was not satisfactory in terms of new features typically found in other container orchestration solutions. Also, it was somewhat time-consuming to keep the cluster updated, and any important change or mistake could affect all the functional domains on the cluster, forcing it to operate outside of office hours and making several non-regression tests each time.
Workload migration options
When we sat down with AWS Solution Architects and the Professional Services team, we were driven by the following high-level guidelines:
- Security is a must. Insurance is a highly regulated business and manages highly sensitive data.
- Being mostly a back office business, claims management is characterized by a high volume of predictable data during given hours; therefore, reliability, ease of operations, and availability during business hours are crucial.
- Due to several legacy on-premises dependencies, we had to maintain the same performance levels in the overall user experience.
- It was important to boost the adoption of managed services to reduce the operational burden.
As containers were already widely used within Generali Italia, and our DevOps teams were already familiar with the technology stack, we considered the following container orchestration services on AWS: Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (Amazon EKS).
At that time, Generali Italia was looking for a solution that was portable, widely used in the market, and that could be deployed on premises or even on local machines. Kubernetes met all these requirements, and Generali Italia is moving to adopt it as part of its enterprise stack; therefore, adopting Amazon EKS was an easy choice.
EKS is a managed service that allows us to automate the deployment, scaling, and management of containerized applications on AWS without requiring us to operate and maintain a Kubernetes control plane.
Moreover, it provides a serverless implementation thanks to the support of AWS Fargate.
AWS Fargate is a technology that allows you to run containers on demand without having to manage servers and worker nodes. This would help us lower operational costs and maintenance efforts even further.
AWS made EKS on Fargate generally available in December 2019, and Generali Italia was one of the first European customers to adopt this technology.
To deploy our applications on a highly available platform, we created our EKS cluster in the eu-south-1 Region (Milan), spanning over the three Availability Zones.
As we didn’t have any experience with AWS Fargate, the first challenge was to define the Fargate pod sizing in terms of CPU and RAM for our Java applications to get the right balance between costs and performances. To manage load peaks, we set up the standard Horizontal Pod Autoscaler to automatically adjust the number of Fargate pods.
Since cost reduction is a driving factor, we configured our environment to downscale the services outside business hours and completely shut down our non-production environment during the night.
Claims microservices have upstream and downstream dependencies running on premises; therefore, we require a stable and performant connection to our data center of Mogliano and Padova. To accommodate this requirement, we deployed our workloads in the eu-south-1 Region and set up an AWS Direct Connect connection to the Generali data centers in order to deliver consistent and low-latency performances.
To further reduce dependencies on services deployed on premises and offload more undifferentiated heavy lifting to AWS, we migrated from Kafka and Redis, originally deployed on the Docker Swarm cluster, to the AWS Managed Services Amazon MSK and Amazon ElastiCache.
To protect the internet-facing endpoints of our web applications from common types of attacks and threats, we leveraged additional services such as AWS WAF and AWS Shield. AWS WAF provides a preconfigured set of rules managed by AWS, which enabled us to get started extremely quickly and define more fine-grained rules over time.
A simplified view of the architecture is depicted in the following diagram:
A more detailed diagram is captured in the architecture described below:
Migration and collaboration with AWS
When we started the project, we had little experience with AWS. We involved AWS Professional Services to help us with the design of the target solution, the setup of our AWS environments, and workload migration.
We started with a proof-of-concept in a sandbox account since all our Java applications were already containerized. To deploy them on EKS, we just uploaded the images in the Amazon Elastic Container Registry (Amazon ECR) and translated the service definition from Docker Swarm to Kubernetes. In just a few days, we deployed over 10 fully functional microservices. Motivated by the success of the proof-of-concept, we had the green light to kick off the project.
AWS Professional Services provided us with a baseline configuration for AWS Services (AWS Organizations, Amazon VPC, AWS CloudTrail, AWS Config, Amazon GuardDuty, AWS Security Hub, and many more) supporting the solution design on EKS. This was a great chance for us to learn by doing so that we could quickly understand how all those services work together and be autonomous once the solution was in production.
Both infrastructure and application deployments have been fully automated through our CI/CD pipelines, enforcing DevSecOps principles before releasing them into any environment. The claims team can create a new, fully functional environment in the order of minutes and quickly test innovative technologies reducing time to market to hours.
In June 2020, three months from the beginning of the project, we began to gradually roll out our users in the production environment.
Today, we can update and scale our infrastructure (like EKS and Amazon MSK clusters) during business hours with zero downtime and no impact on the workload, which allowed us to drastically increase the overall availability. Moreover,having deployed on three Availability Zones, we improved our overall reliability, moving from the active-passive setup of Docker Swarm to an active-active architecture with at least three replicas on AWS.
This project established a new reference architecture and reusable assets, which allow Generali Italia to accelerate the deployment of future applications on AWS. Moreover, we consolidated our DevOps skills, increasing our confidence to develop and maintain cloud-native workloads.
Encouraged by the success of our first cloud workloads, we laid out a roadmap to support the entire organization with its transition. The cloud roadmap defined a set of tools that allows us to select and quickly deploy cloud systems, whether they are new, cloud-native workloads or lift-and-shifts of existing applications. The roadmap leverages the latest architectural principles and supports the evolution of our core systems, making it possible to move them to the cloud. We are working closely with our cloud infrastructure and platform teams to ensure greater control and security of our cloud workloads, as well as continuing cost optimization.
Matteo Generali is the Head of the Generali Italia’s Digital Factory responsible for the development of the country’s customer-facing digital touchpoints; Matteo is also a major stakeholder of the Company’s IT Transformation process with his expertise in Cloud and Agile.
Before this, Matteo was the CTO of a cloud-based technology company listed among AWS Technology partners in Italy.
Andrea Caligaris is leading Claims & Health applications development in Generali Business Solutions. He has several years of expertise in Insurance IT built across different Companies in Europe.
He is a cloud and agility enthusiast and loves to bike and swim whenever the opportunity arises. He currently lives in Treviso, Italy.