My main use of Dataiku is especially data preparation. I use Dataiku a lot for preparing my data, in particular processing and transforming my datasets by using recipes, creating recipes, and especially what I really value is being able to reuse the recipes already created for preparation on another dataset.
I was preparing for my Core Dataiku certificate, and all of the modules were focused on data preparation. I load the data into Dataiku, then I use the recipes and tools to add columns, unpivot columns, delete, and transpose columns so I can format them. Then I create groups of recipes and I also reuse them by importing another dataset into Dataiku, which gives me the ability to save time. I don't have to redo all the previous processes since I already have a recipe for data preparation that I can reuse.
After data preparation, I had the opportunity to carry out an end-to-end project with Dataiku. This involved first the data preparation and then I went on to set up a model to predict a stock index. I used the machine learning models for this project.
Let's suppose I have datasets that I use every time, and each time I'm going to check the data formats to format a certain number of columns, for example dates, to see if they are in date format or not, delete certain columns, rename certain columns, transform the data, and clean them. If I've done all these steps once and I manage to put all these recipes into a group, next time it's an enormous time saver not to have to repeat these steps one by one, but to use directly the recipe or the group of recipes created.
I want to emphasize again the recipes and how we can reuse them with Dataiku. In most data projects, data preparation takes a huge amount of time for professionals, and sometimes unfortunately we repeat the same tasks. Dataiku really brings a solution to this in that we can create groups of recipes or recipes that we can reuse. The reuse of recipes is already a significant advantage. Additionally, Dataiku is one of the rare platforms that offers end-to-end services where we can carry out a data project from start to finish. For example, to carry out my personal project on predicting a stock index, I did practically everything on Dataiku, from data preparation to putting my model into production.
For example, we have an Excel file that will be incremented every month, a file that comes from outside where the format of the file is exactly the same. Each time we do the transformation, but we did it only on the first file. When the others come in, we apply the old recipes. This allowed us to save an enormous amount of time because we were able to automate everything with Dataiku. As soon as the file comes in, the recipes are automatically applied to these files. We no longer have to intervene at the end of each month and spend twenty to thirty minutes cleaning the files. Additionally, it ensures the validation of our data and consistency between the files.