IBM & Red Hat on AWS
Solve data insight challenges with Starburst Enterprise on Red Hat OpenShift Service on AWS
Organizations often have data residing in multiple sources across the enterprise and need a way to collate this data. Having a performant data lake on a trusted and well managed platform helps them gain better insight into their business and make informed decisions.
Starburst Enterprise open data lake house solution deployed on Red Hat OpenShift Service on AWS (ROSA) helps organizations activate the data in and around their lake to uncover new insights. It provides a fast and scalable data lake house powered by Trino, a leading SQL analytics engine. It supports a wide range of data sources and clients while providing integration with familiar business intelligence (BI) tools.
With Starburst Enterprise on ROSA, customers can enable fast data processing and analytics across complex data architectures – including support for cloud and hybrid workloads as shown in Figure 1.
What is Starburst Enterprise?
Starburst Enterprise is a fully-supported distribution of Trino. It connects to familiar data sources like Snowflake, Oracle, PostgreSQL, Teradata, Kafka and so on. It also supports BI tools like Tableau, Power BI, Looker as well as data tools/engines such as dbt, Airbyte, Spark, JDBC and Python clients. Customers can take advantage of these integrations when building their data lake in order to maximize data insights and focus on their critical business requirements. Starburst Enterprise is a containerized query engine that supports autoscaling, so it is well suited for Kubernetes environments such as ROSA. Starburst Enterprise provides helm charts as well as a certified OpenShift operator allowing for efficient management, better performance and cost optimization. Visit the Starburst Enterprise on Red Hat Openshift for more information about the benefits of deploying Starburst Enterprise on OpenShift.
In this blog, we will walk you through the options for running Starburst Enterprise on ROSA. We will discuss the architecture and show you how the Starburst, Red Hat, and AWS components come together to provide a flexible, performant, and scalable analytics solution. We will also explore the architectural decisions to consider so you can choose the one that best fits your organization’s needs.
Why ROSA for Starburst Enterprise in AWS?
Red Hat OpenShift Service on AWS (ROSA) is a fully managed, turnkey application platform that allows you to focus on deploying applications and accelerate innovation by off-loading the cluster lifecycle management to Red Hat and AWS. It is native in the AWS console and integrates with other commonly used AWS services.
ROSA provides a managed Red Hat OpenShift Cluster and is a supported configuration for Starburst Enterprise. Leveraging ROSA allows organizations to focus their resources on delivering business value, and let AWS and Red Hat handle the undifferentiated management of the underlying platform.
Starburst Enterprise on ROSA provides several deployment models allowing you to select the configuration that is best suited for your needs. This gives organizations the agility they need to satisfy the ever-changing requirements of a modern analytics solution like data sovereignty, data gravity and scalability to name a few. This also enables continuous delivery of actionable insights in a timely and effective manner as their data ecosystem evolves.
Starburst Enterprise on ROSA deployment
We will cover three options to deploy Starburst Enterprise on ROSA:
- Starburst Enterprise on ROSA (single region)
- Starburst Enterprise on ROSA (multi region)
- Starburst Enterprise on ROSA (local zones)
1. Starburst Enterprise on ROSA (single region)
The simplest way to get started with Starburst Enterprise on ROSA is to deploy within a single AWS region and VPCas shown in Figure 2. In this configuration the ROSA cluster is deployed in a region and VPC of your choice. Once the ROSA cluster is up and running, the OpenShift operator or helm charts can be used to deploy Starburst Enterprise. You can choose the Amazon Elastic Compute Cloud (Amazon EC2) instance type, size and purchasing options that best suits your needs and can change those selections as your needs evolve.
ROSA and Starburst Enterprise work in concert to automatically add compute resources(scale out) when there are spikes in demand and then remove those resources(scale in) when demand levels out. You can connect to data sources in AWS as well as data sources that are on premises. For security, Starburst Enterprise has built-in access controls as well as an integration with AWS Lake Formation so that access to data can be properly governed.
2. Starburst Enterprise on ROSA (multi-region)
Global organizations often have data sources deployed in multiple regions. To meet regulatory and compliance requirements, organizations can choose to deploy Starburst Enterprise clusters in multiple regions and bridge the clusters together using the Starburst Stargate connector as shown in Figure 3. This deployment model also allows organizations to deploy Starburst Enterprise clusters close to their data to minimize network latency and maximize performance.
In this deployment model Starburst Enterprise is deployed on ROSA following the same process as the single region model. The process is repeated for each region where a Starburst Enterprise cluster is needed. Clusters are bridged together using the Starburst Stargate connector. This connector makes it possible to add data sources from a remote cluster to a local cluster. The following image shows an example of this deployment model. In this example the local cluster is in Region A and the remote cluster is in Region B.
Data sources connected to the remote cluster can be added to the local cluster via Stargate. The remote data sources will appear alongside the data sources in the local cluster and can be queried as if they were connected directly to the local cluster. When a user submits a query to their local cluster any portion of the query that involves a remote source will be processed by the remote cluster. This addresses data sovereignty regulations by processing the data in the location where it is stored. This also ensures that query processing is done as close to the data as possible to reduce latency and egress costs.
3. Starburst Enterprise on ROSA (local zones)
Customers requiring workloads to be even closer to them can use Starburst Enterprise on ROSA in AWS Local Zones as shown in Figure 4. During the deployment of the ROSA cluster, users select the local zone of their choice for that particular cluster and then deploy Starburst Enterprise in the same way as they would in an AWS region. Clusters deployed in local zones can be bridged to clusters in other local zones as well as clusters deployed in AWS regions using the Starburst Stargate connector (Figure 4). Review the Red Hat documentation for more details about ROSA on AWS Local Zones.
Summary
The combination of Starburst Enterprise and ROSA provides organizations with a robust, scalable, and flexible analytics solution tailored to meet the demands of modern data environments. By leveraging ROSA, businesses can offload the complexities of cluster management, allowing them to focus on deriving actionable insights from their data lakes.
The deployment options discussed in this blog —single region, multi-region, and local zones—offer organizations the versatility to optimize their data architecture according to their specific needs. Each approach not only enhances performance through reduced latency and improved resource allocation but also ensures compliance with data sovereignty regulations.
As data continues to grow in volume and complexity, adopting such innovative solutions will be critical for businesses aiming to stay competitive in an increasingly data-driven landscape. Embracing this technology is not just about keeping pace; it’s about leading the charge into the future of analytics.
To further explore using Starburst Enterprise with ROSA, reach out to your Starburst, Red Hat, or AWS account team to discuss your specific requirements. Additionally, you can learn more about deploying Starburst Enterprise on OpenShift by reviewing the Deployment guide.