I have different data sources, including Oracle, DB2, and a MongoDB cluster, so I join all of these data sources using Starburst Galaxy with the federated querying feature. I transform that into Iceberg using Starburst Galaxy, land it in S3 storage, convert it into Iceberg tables, and then use them for dashboarding in Power BI or Tableau.
Starburst Galaxy
StarburstExternal reviews
External reviews are not included in the AWS star rating for the product.
Unified data querying has accelerated petabyte-scale analytics and simplified dashboard delivery
What is our primary use case?
What is most valuable?
I find myself relying most on querying from different databases as well as automatic indexing in my day-to-day work, as I am a data science architect who needs to get the queries in a very short period of time. Starburst Galaxy serves the best purpose for me because if my SLAs are not met with my customers, they will raise a case, and I have tried many other tools, but Starburst Galaxy fits the best.
Starburst Galaxy has positively impacted my organization since we were struggling with Denodo and Dremio, which had their own features but were not helpful in querying large amounts of data, especially semi-structured or unstructured data. Starburst Galaxy addresses this with many YAML files and manifest files for automated maintenance, and it helps reduce the small file problem in different HDFS systems. Additionally, Starburst Galaxy has an MCP server that connects to various agentic pipelines, reducing the time to market for data consumption.
What needs improvement?
For how long have I used the solution?
What do I think about the stability of the solution?
What do I think about the scalability of the solution?
How are customer service and support?
Which solution did I use previously and why did I switch?
How was the initial setup?
What was our ROI?
Which other solutions did I evaluate?
What other advice do I have?
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Unified data from diverse sources has created consistent client views and reshaped data strategy
What is our primary use case?
My main use case for Starburst Galaxy is to use it as a data federation tool, collect data from various data sources, and have a unified view of the data.
A quick specific example of how I use Starburst Galaxy for data federation in my daily work is that I assume I need data from five different data sources, and each data source is on a different database platform, and I have information that I need for my client profile. I can pull data from all those five different data sources and have a consolidated view of the client.
Those are the main use cases for Starburst Galaxy; basically, we are trying to build data products.
What is most valuable?
Starburst Galaxy is very SQL friendly, which stands out for me because I have used SQL in other platforms such as SQL Server, Teradata, and Oracle, so it is very portable with minor changes.
Another feature I appreciate in Starburst Galaxy is that it has object storage with Iceberg storage, which helps optimize data storage and also enables columnar search, which speeds up queries.
Starburst Galaxy has positively impacted my organization by allowing us to rethink the strategy for data and architect data differently; instead of having multiple data marts and siloed data marts, we have a unified vision, and that is how it is changing.
What needs improvement?
One way Starburst Galaxy can be improved is through AI enablement. I have not seen how the user interface is going to function or how users can interact with the data products on Starburst Galaxy using AI, so I am curious to know about that.
I chose a rating of eight because it has many good features, including data federation and the ability to write queries easily. I think there are areas of improvement with respect to AI adaptability, and also in general, the amount of connectors working with other tools are areas where it can be expanded.
For how long have I used the solution?
I have been using Starburst Galaxy for 18 months.
What do I think about the stability of the solution?
Starburst Galaxy is stable in my experience so far.
What do I think about the scalability of the solution?
I do not have enough visibility into the scalability of Starburst Galaxy, but I think we are adding more and more data sources into it, so I believe it is going to be scalable, though results are still pending.
How are customer service and support?
Starburst Galaxy customer support is good.
Which solution did I use previously and why did I switch?
Earlier, we were using traditional databases.
What was our ROI?
I am yet to see the hard numbers regarding return on investment, but I believe it will probably result in money saved and time saved.
What other advice do I have?
My advice to others looking into using Starburst Galaxy would be to first understand your current data environment and make sure that you have the right connectors that Starburst Galaxy can connect to those environments. Have a dedicated team from Starburst who can help you through all the installation and onboarding, and ensure all your personnel who are going to be working on that environment receive good training with proper use cases. I would also recommend using a sandbox in your environment and putting Starburst Galaxy in it so that you can get a taste of how it works with your data. I gave this product a rating of eight.
Outstanding Performance and Savings with Robust Governance
Outstanding Support Team Makes All the Difference
Streamlined Data Analytics with Excellent Support
Fast Data Queries and Robust Access Controls
Effortless Data Federation and Granular Governance Made Easy
Effortless AI Agent Creation with Robust Features
Unified data access improves analytics and simplifies complex processes
What is our primary use case?
I use Starburst Galaxy on AWS as a federated query engine to access our S3-based Iceberg data lake, Snowflake, and Redshift without duplicating data. This enables secure, high-performance analytics and machine learning workloads with consistent governance across all data sources.
How has it helped my organization?
Starburst Galaxy has improved our organization by unifying access to all major data sources, reducing the need for complex ETL processes. In addition to our original use case, it has proven fast and reliable for Iceberg table maintenance, and it has enabled ingestion of Kafka feeds into our AWS S3 data lake, further increasing its value to our data platform.
What is most valuable?
The features I value most are federated querying across S3 Iceberg, Snowflake, and Redshift; native Iceberg table management tools that make maintenance operations simple and performant; and the ability to connect directly to Kafka for streaming ingestion. The federated query capability has also enabled me to build a Sigma Computing dashboard that pulls data from Postgres, BigQuery, and Snowflake through a single Starburst Galaxy connection, greatly simplifying data access and integration.
What needs improvement?
I would like to see better alerting integrations for failures and errors in scheduled tasks and maintenance jobs. I also want support for more connectors such as Kinesis and Firehose, support for more file types such as Avro and JSON, and object storage message queue integration for object storage integrations. A single view of query execution and optimization details, rather than needing to toggle between the Galaxy and Trino UI, would be helpful. Additionally, enhanced control over account and environment variables that would be available in the Enterprise edition would be beneficial.
For how long have I used the solution?
Which solution did I use previously and why did I switch?
I previously used several query engines, including Athena, EMR, Redshift, Snowflake, and BigQuery. Starburst Galaxy’s federated query capabilities allowed me to join data across clouds and platforms, reducing complexity.
What's my experience with pricing, setup cost, and licensing?
I recommend tracking usage metrics from the start, focusing on data scanned and query concurrency, so you can right-size spend. If workloads are steady, you should explore commitment-based pricing for better rates and factor in the operational savings from not having to manage and scale your own Trino or query infrastructure.
Which other solutions did I evaluate?
I reviewed several options including Databricks and Dremio. I was an early adopter of Snowflake and still use it as well. Starburst Galaxy was a better fit for my technology stack and developers.
What other advice do I have?
I have found that Starburst Galaxy’s flexibility makes it worth experimenting beyond the initial deployment plan. Features I originally viewed as secondary, such as Iceberg maintenance and Kafka ingestion, have become everyday tools. Building a strong relationship with the Starburst team has also helped me optimize configurations and discover new capabilities faster.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Platform reduces management overhead by deploying multiple clusters and tracking costs efficiently while enhancing performance with low-latency responses
What is our primary use case?
Starburst Galaxy serves as our primary SQL-based data processing engine, a strategic decision driven by its seamless integration with our AWS cloud infrastructure and its ability to deliver high performance with low-latency responses.
The platform provides a comprehensive suite of functionalities that significantly enhance the daily operations of our data engineers and data analysts.
How has it helped my organization?
Starburst Galaxy has been instrumental in reducing the maintenance effort and management overhead of our Trino cluster, which is particularly valuable given our lean platform team responsible for Kovi's data infrastructure.
The platform has enabled us to deploy multiple clusters for different purposes while providing clear cost tracking and utilization monitoring capabilities.
What is most valuable?
The most relevant functionalities today are cluster autoscaling for intensive load periods and automated metadata management through cleaning, compression, and orphaned file deletion in Iceberg.
These capabilities significantly reduce reading costs, storage expenses, and query processing overhead.
What needs improvement?
I maintain weekly conversations with Starburst's development and support teams, which provides me with visibility into the product roadmap and evolution.
Currently, my primary need is the impersonation functionality for BI solutions within Starburst clusters, which would enable enhanced access control and data governance capabilities.
For how long have I used the solution?
I have used the solution for almost 2 years.
Which solution did I use previously and why did I switch?
Previously, I utilized the AWS stack with Redshift and Athena.
I chose to migrate to Starburst Galaxy due to their expertise with Trino, superior aggregate cost structure compared to my previous solutions, and the rapid product evolution with new functionalities, problem corrections, and performance improvements.
What's my experience with pricing, setup cost, and licensing?
Since Starburst Galaxy's pricing model is simple to understand and easy to predict, there are no major secrets.
Everything is transparent and accessible through the product console.
The only point of attention is the S3 and transfer costs that should also be included when calculating the total cost.