Overview

Product video
This trial version of Dataiku allows you to deploy into your AWS environment for prototyping, testing and evaluating the full extent of Dataiku capabilities.
Dataiku is The Universal AI Platform™, empowering teams to deliver AI and analytics projects faster - all within a secure, collaborative, and governed environment.
- Data Scientists use familiar tools to focus on high-impact work, with automation and streamlined collaboration.
- Business Analysts get faster insights with intuitive data prep and accessible machine learning.
- Data Teams scale projects with built-in governance and transparency.
Built for AWS:
- Connect securely to all data sources, including Amazon S3, Amazon Redshift, and Amazon RDS.
- Scale data and ML processing with Dataiku elastic compute powered by Amazon EKS for Python, R, Spark, and more.
- Accelerate AI development with pre-built workflows integrating AWS AI services, such as Amazon SageMaker and Amazon Comprehend.
- Distributed creation of advanced analytics through its visual platform, fostering greater collaboration between technical and non-technical teams.
- Leverage the Dataiku LLM Mesh to connect to Amazon Bedrock for Chat, RAG, and Agentic workflows.
AI at Scale, Supported Every Step
With expert services and a robust learning platform, Dataiku helps organizations of any size adopt AI at scale - quickly and confidently.
With Dataiku visual, end-to-end collaborative AI platform: - Data Scientists spend more time on high-impact AI projects, leveraging the languages and tools they already know, automating repetitive tasks and efficiently collaborating with stakeholders. - Business Analysts generate deeper intelligence, faster, thanks to comprehensive data access, smart data preparation and accessible machine learning. - Data Teams can deliver more projects and more value from analytics and AI all with built in transparency and governance. Dataiku and AWS innovate together to enable organizations of any size to deliver enterprise AI in a highly scalable environment. - Dataiku natively integrates with AWS Services and products to enable organizations of any size to deliver enterprise AI at scale. - Dataiku enables users to ingest and manipulate a wide variety of data including Athena, Redshift and more, from the AWS ecosystem and beyond. - Dataiku empowers analytic teams to extend data science collaboration through integrations with Amazon Sagemaker Get started today with Dataiku on AWS!
Highlights
- Take full advantage of your investment in the AWS platform with Dataiku's unique push down to Amazon's storage and compute.
- Empower more users to clean and enrich data, build advanced data pipelines, and create machine learning models in a visual interface.
- Accelerate deployment on AWS, leveraging Sagemaker and Bedrock, with a fully managed service (SaaS) hosted and managed by Dataiku.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Vendor refund policy
Refunds are not provided, but one can cancel at any time.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
Please read at https://doc.dataiku.com/dss/latest/release_notes/index.htmlÂ
Additional details
Usage instructions
Browse to http(s)://INSTANCE_PUBLIC_ADDRESS/
You might need to wait few minutes that the instance starts and initializes.
You will have a first authentication to prove that you're the owner of the instance (with a basic access authentication):
- login = instance id
- password = empty
Then, you will have access to Dataiku DSS visual interface. Note that only Chrome and Firefox are supported.
Administrative (command-line) access can be obtained through ssh centos@INSTANCE_PUBLIC_ADDRESS. A standard installation of Dataiku DSS runs under linux user account "dataiku".
For additional information, or any issue, please see our resources and Q & A pages.
Resources
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products
Customer reviews
Has enabled reliable data pipeline creation and supports rule-based alerts for quality monitoring
What is our primary use case?
My main use cases in Dataiku include ensuring a strong data pipeline ingestion. We have people from data management, so we need to take care of the pipeline, their data quality, data drifting, all these things. We are taking care of it with the Dataiku rule-based alert systems we have created.
What is most valuable?
The best feature in Dataiku is that once the data is connected in the underneath layer, it flows exceptionally smoothly if you know how to tweak it. If you don't know, then it will create a mess. If you know how to tweak it and make the data according to your requirement, then it will be good. If you don't know and are trying to learn on the production, then it is a disaster.
I have used Dataiku's AutoML tools. The AutoML tools have helped me on the fly, as you can apply the machine learning models. They are continuously reading your data and then creating the feature enablement. The moment feature enablement has happened, then you can do the model registry on the fly. Those model registries can trigger your new data. Imagine whatever the data test and train that is passed. Your operational data which is coming new every day, then that feature is enabled and it will give the reasonable amount of prediction and reasonable amount of value on the column so that you can utilize those. You can consume those in the application layer.
Dataiku's data source integration flexibility is completely up to the requirement. We are not using it for ourselves. We are using it for business teams, and they are sending the requirement and we are ingesting according to their requirement. The important thing is, imagine raw data is coming A, but they need A plus B plus C multiply by D. All those kinds of enablement we are doing with the help of Dataiku.
Our source system, the core system, is continuously throwing the raw data on the landing layer. Then from the landing layer, we are converting those raw data and making it as a consumption layer, consumable data. With the help of this, we are doing it.
What needs improvement?
In terms of enhancing collaboration within my team, I would not say Dataiku is the best one because it's so expensive. We are not able to provide it to everyone. There are very few people who have the developer license and are using it. Once the data pipeline is created, then we are directly handing over that data pipeline to our user on the ingestion layer. It is not a very cost-effective solution, I must say, though it is good for developing purposes only.
Pricing can be improved.
For how long have I used the solution?
I have been using this product for four years.
What do I think about the stability of the solution?
In my opinion, Dataiku is stable because we know how to use it. There are many unstable things happening, so it's not that only the application is stable or unstable. Even so many other things, we are facing challenges. I cannot only blame one thing.
In terms of stabilization, if my data has no outlier creation in the raw data, then it is quite stable. I would rate it a seven.
How are customer service and support?
For support, I haven't created any support tickets, so I really don't know about it, but it is quite good.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup started with HANA . Then they introduced Databricks . When Databricks got live, then they started giving this license for Dataiku. We got the Dataiku license and learning. Everything went smoothly. Now Databricks is replaced by Snowflake . Even on Snowflake , we can do many things.
What was our ROI?
It is hard to say if I've seen a return on investment in Dataiku because we are far away from the monetization of the data. There are other teams who are taking care of the monetization. We are not from resource management, so it becomes very hard for us to calculate the ROIC on this at each and every application level. We are not using only Dataiku, we are using many other products.
Which other solutions did I evaluate?
In my opinion, it is good, not bad. I must say because I'm using many other tools as for a data operating model. It is much better than other tools because it has a clickable solution. Most of our data citizens who really don't know the coding thing can easily do things with the help of the mouse. Most of the things are working fine, so there is nothing to complain about.
What other advice do I have?
Overall, Dataiku is really good. I would rate it an 8 out of 10.