I use Starburst Galaxy on AWS as a federated query engine to access our S3-based Iceberg data lake, Snowflake, and Redshift without duplicating data. This enables secure, high-performance analytics and machine learning workloads with consistent governance across all data sources.
Starburst Galaxy
StarburstExternal reviews
External reviews are not included in the AWS star rating for the product.
Unified data access improves analytics and simplifies complex processes
What is our primary use case?
How has it helped my organization?
Starburst Galaxy has improved our organization by unifying access to all major data sources, reducing the need for complex ETL processes. In addition to our original use case, it has proven fast and reliable for Iceberg table maintenance, and it has enabled ingestion of Kafka feeds into our AWS S3 data lake, further increasing its value to our data platform.
What is most valuable?
The features I value most are federated querying across S3 Iceberg, Snowflake, and Redshift; native Iceberg table management tools that make maintenance operations simple and performant; and the ability to connect directly to Kafka for streaming ingestion. The federated query capability has also enabled me to build a Sigma Computing dashboard that pulls data from Postgres, BigQuery, and Snowflake through a single Starburst Galaxy connection, greatly simplifying data access and integration.
What needs improvement?
I would like to see better alerting integrations for failures and errors in scheduled tasks and maintenance jobs. I also want support for more connectors such as Kinesis and Firehose, support for more file types such as Avro and JSON, and object storage message queue integration for object storage integrations. A single view of query execution and optimization details, rather than needing to toggle between the Galaxy and Trino UI, would be helpful. Additionally, enhanced control over account and environment variables that would be available in the Enterprise edition would be beneficial.
For how long have I used the solution?
Which solution did I use previously and why did I switch?
I previously used several query engines, including Athena, EMR, Redshift, Snowflake, and BigQuery. Starburst Galaxy’s federated query capabilities allowed me to join data across clouds and platforms, reducing complexity.
What's my experience with pricing, setup cost, and licensing?
I recommend tracking usage metrics from the start, focusing on data scanned and query concurrency, so you can right-size spend. If workloads are steady, you should explore commitment-based pricing for better rates and factor in the operational savings from not having to manage and scale your own Trino or query infrastructure.
Which other solutions did I evaluate?
I reviewed several options including Databricks and Dremio. I was an early adopter of Snowflake and still use it as well. Starburst Galaxy was a better fit for my technology stack and developers.
What other advice do I have?
I have found that Starburst Galaxy’s flexibility makes it worth experimenting beyond the initial deployment plan. Features I originally viewed as secondary, such as Iceberg maintenance and Kafka ingestion, have become everyday tools. Building a strong relationship with the Starburst team has also helped me optimize configurations and discover new capabilities faster.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Platform reduces management overhead by deploying multiple clusters and tracking costs efficiently while enhancing performance with low-latency responses
What is our primary use case?
Starburst Galaxy serves as our primary SQL-based data processing engine, a strategic decision driven by its seamless integration with our AWS cloud infrastructure and its ability to deliver high performance with low-latency responses.
The platform provides a comprehensive suite of functionalities that significantly enhance the daily operations of our data engineers and data analysts.
How has it helped my organization?
Starburst Galaxy has been instrumental in reducing the maintenance effort and management overhead of our Trino cluster, which is particularly valuable given our lean platform team responsible for Kovi's data infrastructure.
The platform has enabled us to deploy multiple clusters for different purposes while providing clear cost tracking and utilization monitoring capabilities.
What is most valuable?
The most relevant functionalities today are cluster autoscaling for intensive load periods and automated metadata management through cleaning, compression, and orphaned file deletion in Iceberg.
These capabilities significantly reduce reading costs, storage expenses, and query processing overhead.
What needs improvement?
I maintain weekly conversations with Starburst's development and support teams, which provides me with visibility into the product roadmap and evolution.
Currently, my primary need is the impersonation functionality for BI solutions within Starburst clusters, which would enable enhanced access control and data governance capabilities.
For how long have I used the solution?
I have used the solution for almost 2 years.
Which solution did I use previously and why did I switch?
Previously, I utilized the AWS stack with Redshift and Athena.
I chose to migrate to Starburst Galaxy due to their expertise with Trino, superior aggregate cost structure compared to my previous solutions, and the rapid product evolution with new functionalities, problem corrections, and performance improvements.
What's my experience with pricing, setup cost, and licensing?
Since Starburst Galaxy's pricing model is simple to understand and easy to predict, there are no major secrets.
Everything is transparent and accessible through the product console.
The only point of attention is the S3 and transfer costs that should also be included when calculating the total cost.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Federated querying delivers integrated data at record speed and reduces processing time
What is our primary use case?
We use Starburst Galaxy to query data across our diverse data ecosystem. Our data has evolved over many years and is spread across many data sources. Starburst enables us to query across this ecosystem without having to move everything into a single location.
Our teams require a method for integrating data from various systems for reporting and ad-hoc analysis, and Starburst Galaxy fundamentally meets this need.
How has it helped my organization?
The biggest win has been the ability to combine data from multiple sources and deliver it to the business at record speed.
This capability has allowed us to query directly through Starburst Galaxy, enabling teams to access integrated data that would otherwise be hard to pull together.
This has reduced both our ETL processing time and storage costs. We are answering questions that would have been hard, if not impossible, to answer previously because the data came from disparate, disconnected sources.
What is most valuable?
Federated querying through Starburst Galaxy has unlocked our ability to move data using SQL, keeping data in the data layer. The ability to use SQL to query multiple data sources and then write to a single destination has been essential.
Additionally, setting up new data connections is straightforward.
What needs improvement?
I would like to see per-model cluster routing selection when using dbt. Cluster startup time can be slow, sometimes taking over a minute.
For how long have I used the solution?
Which solution did I use previously and why did I switch?
We started using Trino, which worked, but we wanted a reliable managed solution to help us scale.
What's my experience with pricing, setup cost, and licensing?
The pricing is transparent and reasonable.
Which other solutions did I evaluate?
We considered using open source Trino.
What other advice do I have?
Starburst Galaxy addresses our primary problem of managing and working with data spread across multiple systems. Our teams can access and combine data from any source, enabling faster insights and reducing the time spent on manual data wrangling.
Starburst Galaxy is becoming a cornerstone of our data platform, empowering us to make smarter and faster decisions across the organization.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Has a cost-effective transformation for data management as efficient querying enhances productivity
What is our primary use case?
Our primary use case is to manage hundreds of terabytes of data efficiently across a wide range of internal use cases, including ingestion/ETL, machine learning pipelining, and customer-facing product workflows.
It is a top priority to enable all engineers to have access to this volume of data without the concern of overspending on expensive cloud warehouse providers.
How has it helped my organization?
We have experienced several improvements across our organization.
Our data ingestion processes previously involved copying data from S3 to Snowflake, which was fairly costly and required constant vigilance to purge old data so that our source tables would not bloat.
Now we are able to move ingestion staging data to Iceberg tables, resulting in a much better experience in terms of both compute and storage costs as well as maintenance.
Data transformation has also become more efficient.
Starburst on Trino, combined with our SQL-native data transformation tool SQLMesh, has delivered anywhere from a two to five times improvement in compute performance across our transformation DAG.
This improvement is largely due to how efficiently Trino scans relevant data without requiring any additional setup, such as defining partitions in Snowflake.
In terms of cost effectiveness, we are already forecasting a 25% reduction in cloud data provider spending, even while continuing to use both Snowflake and Starburst.
This is because we are able to shift a significant amount of compute to Galaxy, and the cost difference compared to our previous approach of running jobs exclusively on Snowflake is substantial.
What is most valuable?
Cross-catalog querying and compatibility with AWS Glue have both significantly enhanced the user experience.
We operate several accounts within our AWS organization, each containing substantial volumes of data, and the onboarding process with Starburst has been fairly quick, even in the face of AWS IAM complexities.
What needs improvement?
The most persistent issue is the cluster spin-up time.
Coming from Snowflake, where warehouse spin-ups are nearly instantaneous, it has been a challenge to adapt.
However, I believe the Starburst team is working on solutions for this.
Additionally, the cluster and query monitoring UI lacks an optimal user experience.
I would recommend that the Starburst team invest in forking the Trino console and enhancing that tool, as observability is very important to us.
More Starburst-specific documentation would also be helpful.
I understand that some Trino functionality, such as certain parameters, is not supported, so clearer guidance would be appreciated.
Which solution did I use previously and why did I switch?
We previously used only Snowflake but are now shifting toward a more hybrid architecture.
We primarily added Starburst to our stack due to the potential for significant cost savings and because implementing a lakehouse is a more effective long-term data strategy.
What's my experience with pricing, setup cost, and licensing?
The setup cost is fairly transparent.
There are many opportunities to find cost savings or discounts, especially for a startup like ours.
I appreciate that the pricing is available online, although I will note that comparable compute is only slightly cheaper than Snowflake warehouse costs, for example.
Which other solutions did I evaluate?
We considered Onehouse and Clickhouse as alternative solutions.
What other advice do I have?
We are in the early phases of our Starburst relationship and are looking forward to how we can grow with it in the future.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Query federation and consistent SQL interface optimize data integration and analysis
What is our primary use case?
I use Starburst as a cost-efficient hosted option for Trino for data integration and ad-hoc analysis across a broad range of data sources. It is surprisingly useful to query SQL Server, a Google Sheet, data in a blob store, and persist it in Postgres for downstream consumption.
In addition, the Galaxy platform features such as scheduling jobs, offering a data catalog, easy permission and access control management, and the strong technical support from Starburst make it a breeze to use compared to something like Athena.
How has it helped my organization?
I have removed data silos, sped up my pipelines (three to five times the speed of Redshift on a per-cost basis), and now have a single point of entry with consistent SQL semantics to all of my data systems.
What is most valuable?
Query federation coupled with excellent performance is the best feature by far. A consistent interface to all my data systems and a friendly UI that supports data personas from Analyst to Architect and everyone in between is extremely valuable.
What needs improvement?
As a hosted option, I wish I had more control over the cluster configuration, specifically regarding some of the more advanced options. Trino is extremely flexible and powerful, but some of this functionality is gated on the Galaxy platform.
Most users and admins will never need these features, but on occasion I have encountered issues that could have been resolved by a configuration change in five minutes rather than redesigning a data product. That said, I have a high degree of expertise with the tool, and this is more of a quibble than a major issue.
For how long have I used the solution?
I have used the solution for four years.
Which solution did I use previously and why did I switch?
If I have the choice of tooling for managing and interacting with data systems, I always choose Trino first and Starburst Galaxy if I am responsible for managing the deployment. My current team deployed on Redshift before I joined, and the first and best architectural choice I made was to migrate to Galaxy.
What's my experience with pricing, setup cost, and licensing?
You pay for cluster uptime. It is important to be aggressive about autoscaling, as a single worker will get you a long way. I recommend never connecting a BI tool to your Galaxy cluster. Instead, write the data to Postgres or a hot database and serve it from there so you don't pay for expensive uptime to serve dashboards.
Which other solutions did I evaluate?
Having a good amount of expertise in the domain, I knew that Galaxy was the right choice for quick deployment. Having managed data at scale (hundreds of terabytes) in the past, I know Trino will get the job done without a lot of hassle.
Athena specifically has two major issues. First, connectors are restricted on write functionality and are more difficult to configure. Not being able to write through connectors is a deal breaker. Second, if you scale out enough, you will encounter issues due to Athena's shared tenancy model and then need to migrate to Trino eventually. It is better to save yourself the hassle.
What other advice do I have?
If you are unsure about the service, try the free trial. You can be up to speed with your existing systems in half an hour.
Combining organizational data seamlessly with reduced operational costs while creating integrated dashboards
What is our primary use case?
I use Starburst Galaxy to connect to many Amazon S3 and RDS data sources, exposing that data for query and analysis by data engineering teams, as well as executive stakeholders in the organization.
I also use the product to serve many Tableau dashboards used by different teams within the organization.
How has it helped my organization?
I am able to combine data from across the organization to create integrated dashboards that are difficult to construct otherwise.
The on-demand nature of Starburst Galaxy has greatly reduced the computation and operational costs to achieve this compared to other open source tools I have used in the past. With Starburst Galaxy, the data is ready and available 24/7.
What is most valuable?
Starburst Warp Speed has helped me reduce overall operating expenses compared to standard query performance.
I am now able to answer questions in a couple of minutes that would otherwise take hours or days of time for my data engineering teams. I have found the cluster management to be extremely useful. I am able to create clusters configured for various workloads and then turn each one on as needed and let it turn itself off when idle.
This has enabled a number of new use cases. I am able to run much larger jobs than in the past without blocking small concurrent tasks. All of my processes are now running on a cluster that is right-sized instead of trying to manage my own infrastructure by scaling up or down numerous times throughout the day. I am also able to segment costs by product, which I was not able to do in the past.
What needs improvement?
I am able to connect Starburst Galaxy to other tools such as Tableau using the connector, but I would like to see better support for spinning up a cold Starburst Galaxy cluster via Tableau, as it currently just times out.
I would like the Starburst connector in Tableau to have the capability to hold the connection open while Starburst Galaxy starts up.
For how long have I used the solution?
I have used Starburst Galaxy for 1.5 years.
Which solution did I use previously and why did I switch?
I previously used Starburst Enterprise on premise in Amazon Web Services.
I switched to Starburst Galaxy to take advantage of automatic feature upgrades as well as shifting infrastructure costs to Starburst's cloud environment, which operates data workloads more efficiently than the Starburst Enterprise on premise solution.
What's my experience with pricing, setup cost, and licensing?
Pricing for Starburst Galaxy is competitive compared to running my own workloads using open source alternatives.
I recommend you consider the total cost of ownership when deciding whether Starburst Galaxy is a good fit for your organization.
Which other solutions did I evaluate?
I compared Starburst Galaxy to Starburst Enterprise and decided to make the switch to their cloud offering.
What other advice do I have?
This product is worth your time to investigate and evaluate.
I highly recommend Starburst Galaxy to any organization with a need to work with data at scale in the cloud, even in a multi-cloud environment.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Starburst - A best of breed Data Lakehouse
One of the best product & outstanding service
They take all our feedback seriously and try to incorporate the required features in the future releases.
Simplifies lot of our business problems.
One of the key component for data federation in our Data Mesh architecture.
Innovative, and high value adding product for all data users at scale.
No more waiting for a pixels dry!
**What I like best:**
1. **Federated Queries:** Starburst's federated querying capability is a game-changer. It allows us to query data across different sources such as Hadoop, S3, or even relational databases like MySQL, and PostgreSQL, with ease. This means there’s no need for complex ETL processes to move data around, saving time and reducing the risk of errors.
2. **Performance Enhancements:** Starburst provides additional performance enhancements over Trino's already fast query engine. It optimizes query execution to minimize latency and maximize efficiency, which is vital for our large datasets and complex queries.
3. **Security Features:** Enterprise-grade security is another standout feature. Starburst provides added layers of protection including data encryption, access controls, and integration with our existing security systems. This ensures our data is not only quickly accessible but also secure.
4. **Scalability:** The scalability of Starburst is impressive. As our data demands grow, Starburst scales with us, ensuring that our analytical capabilities can keep pace with our expanding data footprint without degradation in performance.
5. **Professional Support and Services:** Access to professional support and services ensures that any issues are promptly addressed, and we can also optimize our usage of Starburst with expert guidance.
In conclusion, Starburst provides a robust solution for organizations looking to harness the full power of Trino with additional enterprise capabilities. Its ability to query across various data sources efficiently, enhanced performance, and strong security measures make it an invaluable tool for any data-driven enterprise.