We use the solution for data science and machine learning.
Dataiku Trial
DataikuExternal reviews
External reviews are not included in the AWS star rating for the product.
Initial stages
Very intuitive
Great Product With Tons of Potential
KYC Risk model
Dataiku great data science platform
Dataiku review
Review of dataiku as a developer
Dataiku is Awesome
Transform raw data into structured, ready-to-use assets using intuitive tools enhanced by AI-driven suggestions, auto-schema detection, and intelligent type recognition.
🧪 Continuous Development
Support agile analytics with a CI/CD-style environment where data flows, scripts, and models evolve continuously, promoting rapid iteration and improvement.
⚙️ Ease of Implementation
Minimize setup complexity with modular components, drag-and-drop interfaces, and seamless integration with existing data ecosystems (cloud, on-prem, hybrid).
✅ Robust Data Validation
Ensure data quality through built-in validation checks, profiling dashboards, and the flexibility to implement custom Python logic for complex or domain-specific rules.
🧠 Scenario Building
Model and simulate different business or analytical scenarios using parameterized workflows, branching logic, and reusable components to support what-if analyses.
🌀 Flow Zones
Organize and manage data processes in "Flow Zones" — clearly defined stages (e.g., Ingest → Transform → Validate → Output) that make pipeline orchestration transparent and scalable.
📚 Integrated WIKI Page
Empower collaboration and knowledge sharing with an embedded WIKI page. Document logic, share best practices, track changes, and onboard new users effortlessly.
🚧 Key Pain Points:
Performance Bottlenecks:
Executing complex scenarios on large datasets directly in the DSS engine is slow and resource-intensive, often making it impractical for time-sensitive analytics.
Dependence on External Engines:
To achieve acceptable performance, teams must offload processing to SQL or Spark engines, requiring:
Additional infrastructure setup (clusters, permissions, connections)
Advanced SQL or PySpark expertise, which can be a barrier for data analysts or citizen data scientists.
Debugging Overhead:
Troubleshooting large workflows is cumbersome due to:
Limited transparency into underlying code execution
Multi-layered architecture (visual flow → Spark/SQL translation → execution engine)
Slower iteration cycles, especially with Spark
Prebuilt validation rules with customizable logic (Python/SQL)
Auto-profiling and anomaly detection at ingest
Validation integrated directly into data pipelines and alerts
🧠 Smart Data Ingestion & Reading
Intelligent schema detection, auto-type inference, and data previews
Efficient sampling of large datasets without full-load requirements
Flexible connectors for cloud, on-prem, and APIs with minimal setup
📊 Quick Insights Through Data Visualization
One-click data summaries with charts, distributions, and KPIs
Drill-down capabilities for root-cause analysis
Seamless embedding of visuals into flows, dashboards, and WIKI pages
🔐 Built-in Data Governance
Centralized metadata catalog and lineage tracking
Role-based access controls and audit trails
Versioning, change tracking, and approval workflows
Integration with data privacy and compliance frameworks (GDPR, HIPAA, etc.)
Good user experience but limited capabilities
Saves a lot of time because I can quickly handle all the data preparation tasks and concentrate on building my machine learning algorithms
What is our primary use case?
How has it helped my organization?
We were a team of six Dataiku scientists and one data engineer. We focused on fully leveraging Dataiku for all our data science-related tasks. This included data preparation, preprocessing, benchmarking machine learning algorithms, handling everything related to production, and making our algorithms available to stakeholders.
What is most valuable?
The advantage is that you can focus on machine learning while having access to what they call 'recipes.' These recipes allow me to preprocess and prepare data without writing any code. This saves a lot of time because I can quickly handle all the data preparation tasks and concentrate on building my machine learning algorithms.
What needs improvement?
One of the main challenges was collaboration. Developers typically use GitHub to push and manage code, but integrating GitHub with Dataiku was complicated. While it was theoretically possible to use GitHub with Dataiku, in practice, it was difficult to manage our code effectively and push it from Dataiku to GitHub.
Another limitation was its ability to handle different types of data. While Dataiku is powerful for working with structured data, like regular or geospatial data, it struggled with more complex data types such as text and image. In addition to the challenges with GitHub integration, the limited support for diverse data types was another feature lacking at that time.
For how long have I used the solution?
I have been using Dataiku for over a year.
What do I think about the stability of the solution?
Since Dataiku relies on various open-source libraries and tools, updates or upgrades to these components can sometimes impact the stability of Dataiku's features. This can make it challenging to maintain consistent stability, as changes in the underlying open-source tools can affect how Dataiku functions.
I rate the stability as six out of ten.
What do I think about the scalability of the solution?
There are some scalability issues.
I rate the scalability as seven out of ten.
How are customer service and support?
Technical support was very good compared to other tools. We had access to chat and support.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup is very easy. It has many tutorials and many guidelines. After the initial deployment, it took about a week to manage all the setup and resolve various issues before we had a stable version of Dataiku that we could use consistently.
I rate it as eight out of ten, whereas ten is easy.
What's my experience with pricing, setup cost, and licensing?
It is very expensive.
What other advice do I have?
I wouldn't recommend using Dataiku if only one data scientist is on the team. However, having a larger team—let's say more than five data scientists—can be very helpful. Dataiku offers features that are especially useful when multiple people are working on the same project, and it also has tools that make it easier to move from the proof of concept stage to production.
Overall, I rate the solution as seven out of ten.