Accelerating Apache and Hadoop Migrations with Cazena’s Data Lake as a Service on AWS
By Lovan Chetty, VP Product at Cazena
By Jacob Cokely, ISV Workload Migration Program PDM at AWS
By Dan Taoka, ISV Workload Migration Program PSA at AWS
Are you thinking of migrating to an Apache Hadoop and Spark cluster or data lake for analytics on the Amazon Web Services (AWS) Cloud? You’re not alone. Companies migrate data and analytics workloads to AWS for a variety of reasons, including significant skills shortages for managing complex stacks in on-premises data centers.
Running Hadoop, Spark, and related technologies in the cloud provides the flexibility required by these distributed systems. Augmenting the Hadoop Distributed File System (HDFS) with an object store like Amazon Simple Storage Service (Amazon S3) also has a significant positive impact on cost and resiliency. Not having to manage the physical infrastructure is a big plus.
However, on-premises Hadoop and Spark instances typically contain a large amount of sensitive data. To protect the sensitive data, you need to move the on-premises security and compliance controls to the cloud before you move the sensitive data.
You also need to determine the cloud resources each workload needs to meet the service level agreements (SLAs) for performance on the cloud system.
Finally, you need to ensure the tools and data flows you have invested in over the years still work with this new cloud deployment.
A few cloud vendors and systems integrators have programs to handle these types of issues. The Amazon EMR migration program, for instance, offers a written guide and free local workshop to help you migrate on-premises workloads to the Amazon EMR big data platform.
Cazena provides a production-ready, continuously optimized and secured Data Lake as a Service. Available on AWS Marketplace, it has multiple features that make it easy to migrate your Hadoop and Spark analytics workloads to AWS without the need for specialized skills.
In this post, we walk through those features and explain how they make it easy to migrate to AWS while ensuring your data is as secure on the cloud as it is on-premises.
Cazena is an AWS Partner Network (APN) Advanced Technology Partner with the AWS Data & Analytics Competency. A launch partner in the AWS ISV Workload Migration Program, Cazena helps enterprises around the world accelerate AWS migrations and drive faster outcomes from data and analytics.
Overview of Cazena’s Data Lake as a Service
Cazena’s Data Lake as a Service includes five major capabilities to reduce migration efforts and improve security:
- Your data lake is available on day one.
- Enterprise-grade security and compliance are foundational.
- Gateways restrict connections to only users on the enterprise network.
- DevOps automation and monitoring are built in—no admin required.
- Continuous improvement of security operations reduces risk.
Available on Day One
Cazena’s Data Lake as a Service gives you a production-ready environment to load, store, and analyze your data immediately, on day one, with no additional work required. Your team can immediately begin migrating data and analytics workloads to AWS.
To make this possible, Cazena’s patented software automatically provisions a secure, single-tenant cloud environment. Within this secure perimeter, Cazena’s software deploys comprehensive monitoring and security frameworks to keep the production systems healthy and secure.
Finally, the software automatically deploys and configures the necessary analytic components so they are ready to use. Depending on the workload, Cazena leverages a variety of data and analytics technologies from AWS, Studio, and others.
To continuously improve performance and reduce resource usage, and to deliver the best configuration for each data lake, Cazena performs extensive workload benchmarking on AWS. This benchmarking runs real Hadoop, Spark, and analytics workloads on a variety of combinations of Amazon Elastic Compute Cloud (Amazon EC2) instances, storage options, and networking options.
This benchmarking data helps optimize the amount of cloud resources that need to be provisioned to support the migrated workloads within existing SLAs and budgets.
Enterprise-Grade Security and Compliance
Cazena does not control the data that companies move into their data lake, so all data is treated as highly sensitive. Cazena automatically encrypts all the at rest data that’s moved into the service, ensuring the data, log files, and metadata are not stored in plain text.
This encryption is foundational to the Cazena service, and not a capability that can be turned off by a user. All Cazena service endpoints have Transport Layer Security (TLS) enabled so that any data in motion is encrypted as well.
However, encryption alone is not sufficient security for a production data lake. During initial provisioning, Cazena ensures networks are configured to force isolation of components.
This approach allows fine-grained control of the communication between networks. A combination of intrusion detection, intrusion prevention, anomaly-based host intrusion detection, anti-virus, and anti-malware controls all traffic and interactions to ensure unanticipated actions are flagged and raised.
This array of technology is wrapped in the right processes and controls to ensure there is consistency, auditability, and transparency. Cazena starts with standards from the National Institute of Standards and Technology (NIST) and obtains third-party attestation of controls via SOC 2 auditing.
Each Cazena Data Lake as a Service is private for each enterprise customer, accessible only to users on their network.
When Cazena software provides an additional layer of security for AWS, it ensures there are no public endpoints to the data lake. This approach is preferred from a security and compliance perspective as it minimizes the attack surface of the service.
Connectivity to the data lake on AWS is enabled via the Cazena Gateway, a software component that’s deployed within enterprise customers’ private network. This exposes all Cazena endpoints as private services within the enterprise’s private network. This access architecture also ensures any user of the Cazena service has to first be granted access to their enterprise’s private network.
Figure 1 – Cazena network architecture using Cazena Gateway.
Continuous Optimization, DevOps Automation, and Monitoring
Once an enterprise starts to use their data lake, the service then moves into DevOps mode. This means Cazena monitors and optimizes all layers of the service—from the cloud infrastructure to the analytic compute engines—as well as the Cazena software that enables a production system.
Cazena monitors the health of all components, and rectifies any anomalies it finds. Cazena has trained its software and processes to consume and analyze all of the logging produced by each of the components mentioned above, so that potential issues are quickly identified and rectified.
Ongoing DevOps and continuous optimization ensure all data lake components function within the prescribed performance goals or SLAs. This means an enterprise can focus on using its data and analytics strategically, without spending any resources on performance issues or troubleshooting.
Continuous Improvement Reduces Risk
Security postures are not static, and all data and analytics environments require ongoing security monitoring and operations.
Cazena’s automation performs vulnerability assessments on all data lake components every day. Cazena identifies required patches and builds them into an automation framework, which ensures all components are automatically patched and kept up to date within compliance controls.
In addition, Cazena carefully analyzes the data produced by intrusion detection, intrusion prevention, anomaly-based host intrusion detection, anti-virus, anti-malware, and related security systems. Potential malicious behavior goes directly to Cazena security experts for deeper analysis.
Figure 2 – Cazena built-in security operations.
Customer Use Case: 14 West
14 West is the business services arm of The Agora, a network of media and marketing companies that produces and markets 300+ financial, health, and lifestyle publications to more than 4 million people around the globe. A significant amount of 14 West’s operations focus on sending emails to the company’s subscribers and potential subscribers.
The IT team at 14 West saw an opportunity to consolidate data from siloed operational systems to make it more accessible and available for analytics. The data volumes, variety, and vision dictated a modern platform, with capabilities for advanced analytics—or a data lake.
The data lake would consolidate data from major applications, including CRM, email, web logs, and others. But the potential challenges were all too familiar to 14 West’s CIO.
“We didn’t want to take on the burden of standing up a Hadoop environment and maintaining it,” says Reid McLaughlin, CIO at 14 West. “At my past organization, we had about 10 folks working on it full-time. We felt there wasn’t value in that. Now, Cazena can do this for us.”
Cazena delivered the ideal solution, a private Data Lake as a Service on AWS with built-in automation, security, and no admin required. The deployment choice helped the team move quickly.
“Cazena’s Data Lake as a Service was also a lower-risk way too to see if a data lake would meet our needs,” says McLaughlin. “We were able to get an environment for advanced analytics up really quickly, without needing to hire a team.”
The processing power of the data lake has allowed 14 West to dramatically reduce the time to insights. The data team is also building more capabilities, including data streaming using Apache Kafka and Spark to build more complex data pipelines, and delivering more self-service.
“Data Lake as a Service allowed us to focus our resources on adding more data thought leaders to the group. We’ve been able to build out our strategic data engineering and not have to build out a huge operational team,” McLaughlin says.
Cazena’s Data Lake as a Service includes software and automation that reduce the risks typically involved in cloud migrations. By using Data Lake as a Service for AWS migrations, enterprise teams can focus purely on data and the workloads that need to move, without worrying about infrastructure, security, or platform operations.
Cazena has experience with enterprise Hadoop, Spark, data lake, and analytics migrations to AWS. As a launch partner in the AWS ISV Workload Migration Program, Cazena’s architecture, security, and migration patterns were reviewed and documented in a repeatable migration process playbook, available through AWS.
It’s easy to get started with Cazena’s Data Lake as a Service on AWS to accelerate migrations and outcomes. Find the Data Lake as Service Starter Edition on AWS Marketplace, with private offers available. You can also contact Cazena for instant access to a guided pilot program.
Cazena – APN Partner Spotlight
Cazena is an AWS Data & Analytics Competency Partner. A launch partner in the AWS ISV Workload Migration Program, Cazena helps enterprises around the world accelerate AWS migrations and drive faster outcomes from data and analytics.
*Already worked with Cazena? Rate this Partner
*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.