
Chronosphere SaaS
Centralized monitoring has unified alerts and dashboards for critical cloud applications
What is our primary use case?
Our main use case was to monitor an entire infrastructure as well as the application tech stack which we were having in a cloud environment. We had application telemetry as well, like monitoring traces, metrics, and logs. On the infrastructure side, we also monitored their metrics.
Specifically with Chronosphere, we integrated OpenTelemetry as a collector on a centralized platform and used that to divert all the telemetry logs coming from our infrastructure as well as our application into Chronosphere where we have the dashboarding done. It is best that we have alerting integrated directly into Chronosphere. This means we don't need to set up a third party alerting mechanism. Chronosphere helped us in setting up a detailed dashboard with proper filtration of the data which is coming as well as custom filtering of data traces. We set up an alert for each and every metric and trace which we wanted to monitor, which is very critical at the production level. We were based on an e-commerce platform so each and every order failure or something similar would be very crucial to us to get notified. Chronosphere helped us in setting up those alerts very quickly on a single platform instead of hopping around multiple platforms.
We integrated an entire monitoring setup. We were getting the data from OpenTelemetry collector into Chronosphere, wherein we had a centralized collector picking up the metrics and traces at different endpoints. We had dashboards created and alerting mechanisms also integrated into that, which was directly integrated into some notification mechanisms like Slack or other tools.
Specifically, we had EKS monitoring as well. We were more into a data related environment and had a big data lake setup. Chronosphere helped us in having custom automatic instrumentations which helped us to instrument our application easily and have a proper detailed monitoring setup using Chronosphere which helped us in multiple scenarios. The support team was very good for us. They had a dedicated person assigned to us who would help us very thoroughly in terms of having all the issues sorted out. We had pretty good options of monitoring multiple applications using different methods. One method is using OpenTelemetry, and we also used multiple Lake view setups. This was very useful.
What is most valuable?
There are multiple features which Chronosphere offers that I liked. A couple of features, for example, are the integration of alerting and dashboards into a single platform wherein you can easily create a dashboard and set up monitoring for that particular dashboard. You can set up particular traces of metrics based on the criticality as well as the threshold values and also custom filtration of your metrics and traces and telemetry logs. The severity and automated grouping of your coming logs based on the keyword present in it is very helpful. For example, if it is an error, it is automatically grouped. If it is some particular value we are looking at, that also helps us consolidate them. Dashboarding is very easy and the user interface is very user friendly.
What needs improvement?
We can improve a bit of UI aspects. The UI could be made more user friendly. Sometimes when identifying the specific logs patterns and identifying what metrics and what logs are coming in, going to a specific log explorer and finding it there is a little difficult. It would be very useful if we could group according to projects and have that UI a little more user friendly.
The user interface part was a bit confusing in the beginning. To make it better, I believe we would need some more open sourced or freely available courses on Chronosphere which would help us understand the platform a bit more. The team provides detailed walkthroughs whenever you get into that. However, it would be better if we could have proper video sessions or documentation which would help us understand the tool a bit more.
For how long have I used the solution?
I have been using Chronosphere for around one and a half years.
What do I think about the stability of the solution?
Chronosphere is very stable.
What do I think about the scalability of the solution?
Scalability was not an issue for us. We have it on the public cloud setup, so scalability was never a concern.
How are customer service and support?
The support team was very good for us. They had a dedicated person assigned to us who would help us very thoroughly in terms of having all the issues sorted out.
Before Chronosphere, we were only handling dedicated open source alerting and monitoring mechanisms. When we moved into Chronosphere, one particular thing was that we had a beautiful support team from Chronosphere which helped us to solve a lot of our problems when we were not able to do it on the open source platforms. Integrating Chronosphere was pretty much easy coming from an open source tool and it helped us to streamline our monitoring and alerting setup across our organization, which directly impacted on the streamlining of the process as well as reducing errors and also keeping our environment uptime to a greater extent by those alerts and quick responses.
We have a dedicated team of people who are helping us from setup as well as debugging errors. Anytime you reach out to them, they help us a lot.
I would say you can accept jumping into Chronosphere if you're using an open source tool and looking to a paid solution, because it offers detailed customer support as well as a ton of features which would really help streamline your existing environment and process. It would be a great option if you're looking to make a change.
Which solution did I use previously and why did I switch?
Previously we were using open source solutions like Grafana and Prometheus. The only reason was that we would have to open between too many tools for a proper detailed monitoring setup. Chronosphere helped us better. We had to introduce separate agents to pick up the metrics from applications. Logs were coming through different applications as data sources and then added into Grafana. Creating a dashboard and alerting from there was quite another tool. Grafana was the only tool we had. Once Chronosphere came in, all got consolidated.
How was the initial setup?
I cannot be specific on those terms. The initial setup did take time, but once the setup was completed and we had all the environments up and running, the overall time really got saved because we had all the templates ready. We pulled in a template and created a dashboard as and when needed. We onboarded all the applications onto Chronosphere where we could monitor everything properly. There was a bit of time saved. In terms of employees, the initial setup took some time with some group of people. However, when things got easier or things got moving, we had very few employees involved into that. We were able to handle it with fewer employees.
What's my experience with pricing, setup cost, and licensing?
It was pretty decent and competitive compared to the market. I didn't feel it was on the expensive side. The features and functionalities it provides, I feel the pricing is right on point.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Monitors pipelines with real-time alerts and good documentation
What is our primary use case?
I work as a data engineer, and we have many streaming pipelines. We use Chronosphere to monitor various metrics, such as how much data our pipeline is processing in each batch, the volume of incoming data, our consumption rates, and the time to process each batch. Additionally, we set alerts in Chronosphere for situations like job failures or when the number of processed records falls below a certain threshold. We get alerts if the record count drops below our threshold. Sometimes, we face silent failures, where our system appears to be working fine but isn't consuming any data because another system has stopped sending data. Chronosphere helps us detect these cases.
Another team member was involved in setting up a framework using Terraform on Chronosphere to monitor our job SLAs. We receive alerts on Slack or via email if any job fails to meet its SLA.
What is most valuable?
The alerting features are good because they provide many alerts you can't get with daily tools like Airflow or other orchestration tools, which notify you of job failures. With Chronosphere, I can set custom thresholds, such as the number of records being processed.
I can also configure alerts for silent failures and jobs that take longer than expected. Chronosphere is completely loosely coupled with your pipeline, meaning it has no dependencies on it, which is pretty cool.
What needs improvement?
It isn't very easy. It's not easy for everyone. It would be much easier if there could be a simpler version, like a data number version or an SQL version. It's hard to debug if you don't know the syntax. Also, I saw the Slack alerts feature, which is pretty cool, but I cannot customize my messages on the Slack alerts. It would be great if it were possible to tag people in the alerts. At DoorDash, we have hundreds of pipelines, and if something fails, I want to tag specific people so they can start working on the issue immediately.
For how long have I used the solution?
I have been using Chronosphere for one year.
What do I think about the stability of the solution?
Sometimes, we noticed issues because we used the push method to send our metrics to Chronosphere using Prometheus.
I rate the solution's stability a nine out of ten.
What do I think about the scalability of the solution?
100 users are using this solution, and we have a significant number of metrics coming in. It's highly scalable and can handle the load efficiently.
How are customer service and support?
The documentation is pretty good.
What other advice do I have?
If you want to monitor pipelines and use something like Kafka or any streaming platform, Chronosphere is the best option for monitoring pipelines with real-time alerts. It is loosely coupled with your pipeline, adding no confusion or load. I recommend using it.
Overall, I rate the solution a nine out of ten.
An exceptional, pragmatic observability platform
* Support: knowledgeable, friendly support from day one. I assumed this was pre-contract wooing, but one year later, support is as great as ever.
* Ergonomics: tools are only useful if they're used. Their interfaces load quickly and make sense, and as a result, our engineers are happy to use them.
* Operations: the product just works; the undifferentiated heavy lifting is handled behind the scenes, without incident.
* Cost: they include tools to identify and tame anomalous and low-value data, leading to lower costs without sacrificing signal.
* Prometheus histograms are clunky. It sounds like this may be addressed soon.
A solid observability service
Additionally some tips for people new to using it like in AWS Cloudwatch insights would be helpful
Many features
Chronosphere as Observability
A necessary, scalable observability platform run by a stellar team
Early in the onboarding process, I was blown away when we discussed the tools available in the aggregation tier. It appeared that giving us this level of control wouldn't be good for Chronosphere's revenue in the long run, but I soon realized it's part of their philosophy and mission to give the power back to the customers. As an operator coming from an ELK-based stack (which comes with plenty of operational toils), Chronosphere is a true SaaS where you don't need to worry about the underlying storage and query infrastructure.
The profiling tools that allow you to look at incoming data at various process stages have been handy in many cases. The backend ingest and query performance have been phenomenal, especially compared to our legacy stack. Being able to use rollups to extend the retention of data will prove helpful to us in the long run, which isn't something we've been able to do effectively in our legacy stack.
Product aside, the team has been highly supportive throughout the process, from onboarding to implementation to stabilization. They've been a solid partner for the complex project of moving observability stacks within a large engineering organization.
This challenge is somewhat outside of the sphere of responsibility of Chronosphere. Like the Chronosphere collector, I think there might be some opportunity for productized tooling on the library side to help solve common problems across all organizations working with Prometheus. Still, on the bright side, the team is working on backend features like the usage profiler to give us the next level of visibility.
Review from Kevin
Scalable metrics storage with M3DB
* Built on industry standards: Prometheus and Grafana
* Uses M3DB for scalable storage
* Customer is not required to manage storage scaling, sharding, or federation
* Knowledgeable, hands-on support by account managers