Overview
For Private Offer Pricing, please contact:
PrivateOfferPricing@pentaho.com
Datasheet:Pentaho Data Integration
With Pentaho Data Integration - Managing the enormous volumes, variety, and velocity of data is simplified
By allowing data preparation from any source and automating your data pipeline, Pentaho Data Integration allows you to curate data better for your business user. This software delivers business analytics to end users faster with visual tools that reduce time and complexity - without writing SQL or coding in Java or Python. Organizations immediately gain real value from their various data sources in the cloud or on premises, including files, relational databases, big data sets and more.
Turn Data Into Actionable Insights
More than just ETL (Extract, Transform, Load), Pentaho Data Integration is a codeless data orchestration tool that blends diverse data sets into a single source of truth as a basis for analysis and reporting. Effortlessly managed in a drag-and-drop graphical interface, so you can easily track where it's coming from, where it's going and how it's transforming.
Data Processing Performance and Productivity
PDI speeds performance time, reduces the complexity of integrating big data sources, and provides:
- Code-free data transformation
- Template-based approach to rapidly onboard data sources into Hadoop
Scalability, Simplicity, and Self-Service
With broad connectivity to any data type and high-performance Spark and MapReduce execution, PDI simplifies and speeds the process of integrating existing databases with new sources of data.
- Intuitive, drag-and-drop designer
- Rich library of prebuilt components
- Powerful orchestration capabilities
Integration and Extensibility
- API Integration: Comprehensive REST and SOAP APIs
- Plugin Architecture: Extend capabilities with a rich plugin ecosystem
- Third-Party Tool Integration: BI tools, databases, etc
Broad Connectivity and Data Delivery
PDI offers broad connectivity to a variety of diverse data, including structured, unstructured and semi-structured data.
- Relational database management system (RDBMS): Oracle, IBM DB2, MySQL, Microsoft SQL Server, Postgres, IBM MQ
- Spark and Hadoop: Cloudera, Hortonworks, Amazon EMR, MapR (HPE Ezmeral Data Fabric), Microsoft Azure HDInsights, and Elastic Search
- NoSQL databases and object stores: MongoDB, Cassandra, HBase, Hitachi Content Platform, AWS S3, Google Cloud Storage, Microsoft Azure ADLS Gen 2
- Analytic databases: Redshift, Snowflake, Vertica, Greenplum, Teradata, SAP HANA, Amazon Redshift, Google Big Query
- Business applications: SAP, Salesforce, Google Analytics
- Files: XML, JSON, Microsoft Excel, CSV, txt, Avro, Parquet, ORC, EBCDIC (mainframe), unstructured files with metadata, including audio, video and visual files
Highlights
- Code-free data transformation design that empowers 15x faster productivity versus hand-coding and executes in-cluster for high performance - Template-based approach to rapidly onboard data sources into Hadoop via metadata injection feature set.
- Ability to seamlessly switch between execution engines, such as Spark and the PDI native engine, to fit data volume and transformation complexity - Support for advanced analytics models from R, Python, Scala and Weka to operationalize predictive intelligence while reducing data prep time.
- Robust Dataflow Orchestration of pipeline - Support both structured and unstructured data.
Details
Typical total price
$3.666/hour
Pricing
Free trial
Instance type | Product cost/hour | EC2 cost/hour | Total/hour |
---|---|---|---|
m5.large Recommended | $3.57 | $0.096 | $3.666 |
m5.xlarge | $7.13 | $0.192 | $7.322 |
m5.2xlarge | $12.49 | $0.384 | $12.874 |
m5.4xlarge | $21.83 | $0.768 | $22.598 |
m5.8xlarge | $36.53 | $1.536 | $38.066 |
Additional AWS infrastructure costs
Type | Cost |
---|---|
EBS General Purpose SSD (gp2) volumes | $0.10/per GB/month of provisioned storage |
Vendor refund policy
No Refunds
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Additional details
Usage instructions
Application access: http://<IP_address>:8080
Launching a Pentaho instance in a hyperscaler https://docs.hitachivantara.com/r/en-us/pentaho-data-integration-and-analytics/9.5.x/mk-95pdia001/pentaho-installation/hyperscalers/launching-a-pentaho-instance-in-a-hyperscaler
Product Documentation: https://docs.hitachivantara.com/p/pentaho-dia
Getting Started: https://docs.hitachivantara.com/r/en-us/pentaho-data-integration-and-analytics/10.2.x/mk-95pdia000
Administration: https://docs.hitachivantara.com/r/en-us/pentaho-data-integration-and-analytics/10.2.x/mk-95pdia002
Product Documentation: https://docs.hitachivantara.com/p/pentaho-dia
Pentaho website: https://pentaho.com/
Resources
Vendor resources
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products
Customer reviews
Totally worth it!!
having portability issue and less user friendly.
ETL for Dashboards
ETL with graphical interface
PDI, best data cleaning tool
1. Its under apache2.0 license so while you read and work under the agreements, you can have this powerful tool for free
2. Has a very friendly user interface, so anybody, even without strong programming skill could make some transformations in just minutes
3. It has a wide variety of data inputs formats, allowing you to read from simple csv's or excels files to databases, json's and even s3 storage
4. It has a lot of tools for transformating your data without coding
5. If the functions that PDI has integrated aren't enough for you, you can add some scripting steps
Open Source ETL Tools
Pentaho Data Integration (PDI) is a very high performance product compared to the paid ETL tools. The product is quite simple to use. The components on the left side of the product have all the components that the user needs. (For example; excel connection, row value, etc.) In my experience, the Logging screen is not descriptive. Sometimes you cannot identify the source of the error. Other than that, I am very satisfied with the PDI tool