AWS Marketplace: Matillion ETL for Snowflake Comments

Some of the valuable features are mid-pipeline data sampling and automatic database-object verification.

By RichardWilliams
on 01/14/2019

* It enabled an enterprise data-warehouse to be set up and operated, quickly and cheaply.

* The pipeline UI provides a means to present solutions to analysts and non-tech management for review and agreement.

What is most valuable?

* It works well with AWS Redshift: I have used Informatica, SnapLogic, and Talend and they do not work well with Redshift.

* Mid-pipeline data sampling: Without changes to pipelines, one can quickly and easily track down errors.

* Automatic database-object verification: A fundamental part of how Matillion works is ensuring objects, tables, columns, views, and other metadata are ready and available for use if, and when, a pipeline is started.

* Run-time parameters: These enable pipelines to be organized and modularized with minimum effort.

* An array of source-data components: Pulling data from wide tables, such as SFDC, can be setup in seconds. Using Sqoop to pull data to S3, for example, can take hours or even weeks to get right. It is worth noting that Matillion does not cost more if you need more source-data components (SFDC, Marketo, Google AdWords, RDS, MongoDB, etc.).

* A wide array of in-database DDLs and UDFs accessible from ETL are able to process unstructured data easily, without having to resort to EMR/Hadoop solutions.

* Integration with other AWS services: It can use Amazon SQS or SNS very easily to extend capabilities, such as doing micro-batch imports (near real-time updates) from source-systems.

* Python components using Boto and other libraries running on Matillion‚Äôs EC2: Matillion can push data from a data warehouse, via REST APIs, to target systems like DynamoDB and Marketo.

What needs improvement?

Compared to the likes of traditional ETLs, like Informatica, SnapLogic, and Talend, or even raw Python scripts, this product needs no improvement, as it is so much better.

Any new product like this has teething problems that get solved pretty quickly in the next release. Better user documentation with more examples would be helpful, especially in areas with run-time parameters or JavaScript inserts.

What do I think about the stability of the solution?

There have been some issues with stability over the first year, but Matillion support is very responsive. I have allowed them to log into our system on occasion.

What do I think about the scalability of the solution?

There are no issues with scalability if one strictly does all transformations in-database, using Redshift‚Äôs DDL/SQL.

All the ‚Äòheavy-lifting‚Äô is done by Redshift, as it is MPP. Simply adding more nodes deals with scalability. It is worth noting that Matillion does not cost more if you add more Redshift nodes.

If one uses Python components (as opposed to UDFs), one may encounter scalability issues.

The CPU utilization in WatchTower, of Matillion‚Äôs single EC2 (it is not, itself, MPP), will peak. Therefore, it is best to keep a close watch over what your data engineers are doing with Python components.

How is customer service and technical support?

I would give technical support a rating of 5/5.

Which solutions did we use previously?

We used Informatica, SnapLogic, and Talend. They do not work well with Redshift and they cost more. They do not understand MPP and much of what they do is outside of Redshift, i.e., not in-database.

You need to put them on a bigger EC2 or buy multiple licenses and have multiple EC2s to manage, in order to get scalability.

How was the initial setup?

The initial setup was very straightforward, as it‚Äôs all done from the AWS Marketplace. A wizard steps you through the process of setup. Due to Matillion‚Äôs clean and clear architecture, there is not much to configure before one is up and running.

What's my experience with pricing, setup cost, and licensing?

Regardless of the quantity of your data, the size of your cluster, or variety of source systems, the price of Matillion is the same.

The only variable that changes what you pay Matillion is the size of your data engineering team.

* If your team is just one or two people, then you can just use the [t2.medium @ $1.37/hr]

* If you have a bigger team, you will need [m4.large @ $2.74/hr] or even [m4.xlarge @ $5.48/hr].

As soon as you can, lock in the yearly discounted price with Matillion, as your level of support availability will increase.

Which other solutions did I evaluate?

We evaluated Informatica, SnapLogic, Talend, Sqoop, and pure Python scripts. Don‚Äôt go with any of these if your data can be categorized as any two of the following: volume, variety, and velocity.

What other advice do I have?

* Experiment and test it ASAP

* Watch all the videos from Matillion

* Join their webinar series

* Talk with existing users