Overview

Product video
This trial version of Dataiku allows you to deploy into your AWS environment for prototyping, testing and evaluating the full extent of Dataiku capabilities.
Dataiku is The Universal AI Platform™, empowering teams to deliver AI and analytics projects faster - all within a secure, collaborative, and governed environment.
- Data Scientists use familiar tools to focus on high-impact work, with automation and streamlined collaboration.
- Business Analysts get faster insights with intuitive data prep and accessible machine learning.
- Data Teams scale projects with built-in governance and transparency.
Built for AWS:
- Connect securely to all data sources, including Amazon S3, Amazon Redshift, and Amazon RDS.
- Scale data and ML processing with Dataiku elastic compute powered by Amazon EKS for Python, R, Spark, and more.
- Accelerate AI development with pre-built workflows integrating AWS AI services, such as Amazon SageMaker and Amazon Comprehend.
- Distributed creation of advanced analytics through its visual platform, fostering greater collaboration between technical and non-technical teams.
- Leverage the Dataiku LLM Mesh to connect to Amazon Bedrock for Chat, RAG, and Agentic workflows.
AI at Scale, Supported Every Step
With expert services and a robust learning platform, Dataiku helps organizations of any size adopt AI at scale - quickly and confidently.
With Dataiku visual, end-to-end collaborative AI platform: - Data Scientists spend more time on high-impact AI projects, leveraging the languages and tools they already know, automating repetitive tasks and efficiently collaborating with stakeholders. - Business Analysts generate deeper intelligence, faster, thanks to comprehensive data access, smart data preparation and accessible machine learning. - Data Teams can deliver more projects and more value from analytics and AI all with built in transparency and governance. Dataiku and AWS innovate together to enable organizations of any size to deliver enterprise AI in a highly scalable environment. - Dataiku natively integrates with AWS Services and products to enable organizations of any size to deliver enterprise AI at scale. - Dataiku enables users to ingest and manipulate a wide variety of data including Athena, Redshift and more, from the AWS ecosystem and beyond. - Dataiku empowers analytic teams to extend data science collaboration through integrations with Amazon Sagemaker Get started today with Dataiku on AWS!
Highlights
- Take full advantage of your investment in the AWS platform with Dataiku's unique push down to Amazon's storage and compute.
- Empower more users to clean and enrich data, build advanced data pipelines, and create machine learning models in a visual interface.
- Accelerate deployment on AWS, leveraging Sagemaker and Bedrock, with a fully managed service (SaaS) hosted and managed by Dataiku.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Vendor refund policy
Refunds are not provided, but one can cancel at any time.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
Please read at https://doc.dataiku.com/dss/latest/release_notes/index.html
Additional details
Usage instructions
Browse to http(s)://INSTANCE_PUBLIC_ADDRESS/
You might need to wait few minutes that the instance starts and initializes.
You will have a first authentication to prove that you're the owner of the instance (with a basic access authentication):
- login = instance id
- password = empty
Then, you will have access to Dataiku DSS visual interface. Note that only Chrome and Firefox are supported.
Administrative (command-line) access can be obtained through ssh centos@INSTANCE_PUBLIC_ADDRESS. A standard installation of Dataiku DSS runs under linux user account "dataiku".
For additional information, or any issue, please see our resources and Q & A pages.
Resources
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products
Customer reviews
Automating end-to-end data pipelines has boosted team productivity and simplified analytics
What is our primary use case?
My main use case for Dataiku is general; I create ETL pipelines and then automate everything using that, along with ML modeling. These are the major use cases that I have for Dataiku . On a daily basis, we use Dataiku for ad hoc analysis for following the product lifecycle.
For one of my use cases with Dataiku, we are using it where the data resides in Snowflake and the expectation is to orchestrate and automate a complete CI/CD pipeline, with the final data residing on S3 . In between, there are multiple logics and transformations that we have to build in. Along with that, we are supposed to do all the DQ checks, data quality framework, and data governance. We automated everything using Dataiku, and now the project is live, with overall efficiency being very good.
Automating that workflow with Dataiku increased the overall productivity of the team compared to the tasks that we used to do earlier using other ETL tools. Dataiku has optimized that, and data visualization became easy. The checkpoints that Dataiku provides, such as analyzing the data and finding the outliers, became easy, and sharing the data sets became easy as well. Now, with the visual recipes, even people who can't code can also do the transformation, so overall, it is a good tool.
We have created a few visual recipes that are not only limited to the project; we created a package so that they don't have to code that part of logic again and again. We have provided them as a recipe, which is a good thing.
What is most valuable?
The best features Dataiku offers include the data analysis part, the ETL, and the overall orchestration part. We can create a recipe and share it with others without having to code that again and again, and we can create an application and a dashboard in one single place. These are the very good features of Dataiku.
I find myself and my team relying the most on the data analysis part of Dataiku. We use it to visualize the data, find the outliers, and it helps us very well.
Dataiku has positively impacted my organization as most of our projects have been migrated to Dataiku, and now people are relying on it as a go-to tool for all our data use cases. This migration has led to measurable improvements, as most of the projects have been migrated and the overall efficiency has increased. Most people who used to do tasks manually are now working on automating that.
What needs improvement?
Dataiku can be improved from the dashboard perspective because right now it is very restricted, and I feel that can be improved. API integration and other aspects can also be enhanced, but I am pretty impressed with the rest of it.
For how long have I used the solution?
I have been using Dataiku for four years now.
What do I think about the stability of the solution?
Dataiku is stable.
What do I think about the scalability of the solution?
Dataiku's scalability is pretty good; I can scale the projects very easily, and clear guidance is given as well. I have no issues with that.
How are customer service and support?
I need to stress upon the part about customer support because there are some product issues we have identified and raised with customer support, but sometimes the response is delayed, so that can be improved.
Which solution did I use previously and why did I switch?
I previously used a different solution before Dataiku, and the other solution was not cloud-based; they were local, which made the license cost higher.
How was the initial setup?
My experience with pricing, setup costs, and licensing is good because that was managed by my IT team, and overall it was seamless with clear guidance given.
What about the implementation team?
We have a direct link with Dataiku; we did not purchase it through the AWS Marketplace .
What was our ROI?
I cannot share the numbers regarding return on investment.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup costs, and licensing is good because that was managed by my IT team, and overall it was seamless with clear guidance given.
Which other solutions did I evaluate?
Before choosing Dataiku, I evaluated other options, specifically Databricks .
What other advice do I have?
My advice to others looking into using Dataiku is to first understand the product, which is very important. You should first see what your use case is, what Dataiku is offering, and understand that it is a tool meant not only for coders but also for higher management, as they can do drag and drop to easily perform transformations without needing to write code. Dataiku is a tool for everyone. I would rate this product an 8.5 out of 10.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Visual workflows have streamlined daily ETL analysis and support collaborative project work
What is our primary use case?
My main use case for Dataiku involves ETL pipelines, mainly for data analysis, and I majorly use SQL queries for that.
For ETL pipelines and data analysis, I had to create the output by combining a few datasets and then running SQL queries, applying filters, joining the tables, and so on; so I used Dataiku for that.
Regarding my main use case with Dataiku, I primarily use it for analysis only, and the visual recipes of Dataiku and the SQL query are enough for that. No challenges have occurred so far, but the only challenge is that Dataiku gets slow sometimes and lags a lot.
What is most valuable?
The best features Dataiku offers in my experience are its visual recipes, which are very easy to use for analysis.
The visual recipes are easy and useful for my analysis because the Sync recipe is very useful if I want to download a table from the cloud into the Dataiku database and schema. Other recipes such as the Prepare recipe are also very useful since you don't have to write code; it's all visual and very easy to use. Recipes such as Stack are also very useful as you don't have to write full SQL code for it, allowing you to speed up the process.
Dataiku has positively impacted my organization since we use it majorly for our day-to-day work, and it is very helpful in creating and managing ETL pipelines to create a project flow, making it easy to go back to any step and then make edits if some changes occur.
What needs improvement?
I have no suggestions for improvements because it's all good; it just sometimes lags a lot, and I don't know if the server is full or what, but it sometimes takes a lot of time while loading and refreshing the page.
No additional thoughts on improvements have come to mind, but the performance can be more optimized to reduce the waiting time. Dataiku is down a lot of times, and we have to wait for sometimes five, ten, or fifteen minutes, after which it gets working again, and during those times, we are unable to get our work done.
For how long have I used the solution?
I have been using Dataiku for four years, so my experience with it is quite extensive.
What do I think about the stability of the solution?
Dataiku is stable for most of the time, but for around ten percent of the day, it is usually down, and we are unable to do work on it.
What do I think about the scalability of the solution?
Dataiku's scalability is good.
How are customer service and support?
I have never needed the requirement for customer support from Dataiku.
Which solution did I use previously and why did I switch?
I have been using Dataiku for the last four years, and I have not used any other solution besides Dataiku.
What was our ROI?
It is a good return on investment since it helps save a lot of time, and it's easy for my teammates to work cross-functionally on the same project.
Which other solutions did I evaluate?
I did not evaluate other options before choosing Dataiku because it was all managed by my organization, so I had to use Dataiku only.
What other advice do I have?
My advice for others looking into using Dataiku is that it's a good software, and I would suggest them to keep using it since it's a very good tool for data analysis uses.
I have no additional thoughts about Dataiku; it's all very good for the use cases, but if the performance can be improved to be more stable with lesser lags, it would be much better. I would rate my overall experience with Dataiku an 8 out of 10.
Visual workflows have streamlined healthcare analytics and have reduced reporting time significantly
What is our primary use case?
My main use case for Dataiku is mostly based on the client's data where we look into life sciences data, mostly aligned to claims, campaign measurement, campaign targeting, IQVIA, LAD Epsilon data, and Komodo for instance.
Apart from this, I'm basically working on creating an end-to-end pipeline as a bundled unit project, which has been the overall case. We primarily work on Next Best Engagement and Next Best Actions, more or less aligned to the healthcare side, while sometimes working on the consumer front and on the professionals front, meaning healthcare professionals (HCP ).
A specific project I built in Dataiku was on HCP campaign measurement. Our day-to-day cycle involves ingestion of data from our S3 , which is the client's S3 storage. We fetch the data, perform some visual recipes to bring it onto Dataiku DSS, make preliminary changes, preprocess the data, do some data preparation, and perform feature engineering to have the final model ready dataset for modeling.
We create multiple iterations of the model where Dataiku is of great help, allowing us to try multiple modeling iterations with different hyperparameters, saving a lot of time and providing a visual overview for everyone to understand how the data is performing. Once the modeling is done, we push the data downstream through an API or use MLOps for productionization, either via CI/CD pipeline or just simple scenario triggers such as sending an email once a job gets done. This primarily results in our day-to-day activity.
What is most valuable?
The best features Dataiku offers include primarily the visual recipes, which ease data preparation greatly. It is very easy now to handle small tasks where you need to understand the shape of data; instead of writing a query, you can just use a visual recipe to create the views. You can also have multiple intermediate views, which is significantly helpful for larger streams, especially during reverse engineering.
Additionally, the automation piece and scenario triggering has been a boon for me, as my projects often involve weekly or monthly reporting. Everything is set up so that we just need a human in the loop to ensure everything follows properly, with time-based triggers automatically generating and sending reports to stakeholders.
Furthermore, the integration capabilities and the ability for multiple team members to access the same projects concurrently enhance collaboration, making it quite beneficial for data scientists such as myself as we progress in our careers.
Dataiku has positively impacted my organization, specifically in one project where we performed migration from AWS to Dataiku, speeding up the solution by close to 40%. We completed tasks that used to take 10 days in just four days. Moreover, the architecture costs associated with AWS were reduced by almost 70%, which was a significant benefit and greatly impacted our operations. This success has enabled us to pitch Dataiku to clients, who have actively incorporated it into their daily work streams, resulting in a win-win situation.
The 70% cost reduction and 40% faster delivery came primarily from the ease of use in how we were creating architecture. Since we were migrating, we leveraged the opportunity to improve and enhance the architecture. The earlier AWS architecture was hampered by multiple services leading to high costs, but moving to Dataiku streamlined everything into one platform. Consequently, the delivery time for generating reports for stakeholders decreased from 10 days to three to four days.
What needs improvement?
In terms of improvement, I cannot comment on the LLMs or the agentic view as I have not used them yet. However, I feel that better documentation is necessary. Dataiku should establish a stronger community since this is proprietary software, where users can share knowledge. Although they have some community interaction, it is often challenging to find assistance when stuck.
For example, when I was new to Dataiku and trying to use an external optimization tool such as CPLEX, I struggled with resource directory linking to a project's notebook. Detailed documentation and community discussions could have significantly alleviated these issues for users such as myself.
For how long have I used the solution?
I have been using Dataiku for close to three and a half years.
What do I think about the stability of the solution?
There were a few challenges, but they were not from Dataiku's standpoint in terms of technicality; they were more related to the rapid updates where we currently work on version 10.2, and soon, we are on version 11.4, requiring things to be redone. The support for earlier projects created on the older version is something the team could look at, as it would help if there was a backup proposition in place to avoid hampering our work due to updates.
What other advice do I have?
While I do not have a particular feature that surprised me, I found the plugins available in Dataiku to be very helpful. Not only can users leverage existing plugins, but we can also create our plugins based on the rules we use daily. This feature is quite handy and extends beyond just individual projects, as published plugins can be used by everyone across the board.
My advice to others considering Dataiku is to utilize the visual recipes, as they can significantly expedite project creation. Although the fundamental processes remain the same, leveraging elements in visual recipes can enhance efficiency, making it easier than writing code for basic queries, resulting in quicker execution. Dataiku encompasses everything from visualization to integrations and sharing the results, so once you dive in, it is important to familiarize yourself with the available features and make the most out of them.
I would rate this product a 9 out of 10.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Reusable preparation workflows have transformed recurring datasets and automate end to end projects
What is our primary use case?
My main use of Dataiku is especially data preparation. I use Dataiku a lot for preparing my data, in particular processing and transforming my datasets by using recipes, creating recipes, and especially what I really value is being able to reuse the recipes already created for preparation on another dataset.
I was preparing for my Core Dataiku certificate, and all of the modules were focused on data preparation. I load the data into Dataiku, then I use the recipes and tools to add columns, unpivot columns, delete, and transpose columns so I can format them. Then I create groups of recipes and I also reuse them by importing another dataset into Dataiku, which gives me the ability to save time. I don't have to redo all the previous processes since I already have a recipe for data preparation that I can reuse.
After data preparation, I had the opportunity to carry out an end-to-end project with Dataiku. This involved first the data preparation and then I went on to set up a model to predict a stock index. I used the machine learning models for this project.
Let's suppose I have datasets that I use every time, and each time I'm going to check the data formats to format a certain number of columns, for example dates, to see if they are in date format or not, delete certain columns, rename certain columns, transform the data, and clean them. If I've done all these steps once and I manage to put all these recipes into a group, next time it's an enormous time saver not to have to repeat these steps one by one, but to use directly the recipe or the group of recipes created.
I want to emphasize again the recipes and how we can reuse them with Dataiku. In most data projects, data preparation takes a huge amount of time for professionals, and sometimes unfortunately we repeat the same tasks. Dataiku really brings a solution to this in that we can create groups of recipes or recipes that we can reuse. The reuse of recipes is already a significant advantage. Additionally, Dataiku is one of the rare platforms that offers end-to-end services where we can carry out a data project from start to finish. For example, to carry out my personal project on predicting a stock index, I did practically everything on Dataiku, from data preparation to putting my model into production.
For example, we have an Excel file that will be incremented every month, a file that comes from outside where the format of the file is exactly the same. Each time we do the transformation, but we did it only on the first file. When the others come in, we apply the old recipes. This allowed us to save an enormous amount of time because we were able to automate everything with Dataiku. As soon as the file comes in, the recipes are automatically applied to these files. We no longer have to intervene at the end of each month and spend twenty to thirty minutes cleaning the files. Additionally, it ensures the validation of our data and consistency between the files.
What is most valuable?
What I especially prefer about Dataiku are the recipes for data preparation and also the feature we have to create groups of recipes and also to be able to reuse them again.
Once you get the hang of Dataiku, learning the features is also intuitive. It's really a very intuitive platform.
Dataiku is very scalable. It can easily adapt to the expansion of our datasets and it is very powerful. If we have more and more data, Dataiku is very scalable.
What needs improvement?
Currently, Dataiku is a platform that is almost perfect, and I don't see how to improve it further. I don't have suggestions for potential improvements.
Maybe on the interface in general, the information can easily get lost. If we could summarize the tools bar in a more organized way than what we currently have, that would be helpful.
For how long have I used the solution?
I have been using Dataiku for one year and a half, and I already obtained my Core Dataiku certification about a month ago.
What's my experience with pricing, setup cost, and licensing?
The licenses are a bit high for companies that are still hesitating to get started with using Dataiku. For my personal projects, I used the thirty-day free trial. Regarding my company, I did not have access to this pricing information.
Which other solutions did I evaluate?
Given our needs, the best tool, despite the licenses and the cost of the license, Dataiku turns out to be by far the favorite tool compared to the others.
What other advice do I have?
Dataiku is really a very intuitive platform. It allows you to carry out data projects from end to end. We also have the opportunity to reuse templates, models, and recipes. That's one of the big advantages of using Dataiku.
In the context of my personal projects, I developed a pleasure in using Dataiku, which is not the case for other tools. Because the platform is intuitive, I can easily guide myself through it.
I find the documentation on Dataiku very informative and also very instructive.
I would tell others to go for it if Dataiku truly meets their needs. It's the best offer on the market with good documentation. My overall rating for Dataiku is nine out of ten.
Dataiku:A plug in tool for Data Science
For me, it has been helpful because it simplifies the process of turning raw data into useful insights and models. It also improves collaboration between technical and non-technical teams, since analysts can use the visual interface while data scientists can still write code when needed. Overall, it helps speed up the development process and makes data projects more structured and easier to manage.