Listing Thumbnail

    Pentaho Data Integration

     Info
    Free Trial
    Ingest, Blend, Cleanse and Prepare Diverse Data From Any Source, in Any Environment - With No Code.
    Listing Thumbnail

    Pentaho Data Integration

     Info

    Overview

    For Private Offer Pricing, please contact:
    PrivateOfferPricing@pentaho.com

    Datasheet:Pentaho Data Integration 


    With Pentaho Data Integration - Managing the enormous volumes, variety, and velocity of data is simplified

    By allowing data preparation from any source and automating your data pipeline, Pentaho Data Integration allows you to curate data better for your business user. This software delivers business analytics to end users faster with visual tools that reduce time and complexity - without writing SQL or coding in Java or Python. Organizations immediately gain real value from their various data sources in the cloud or on premises, including files, relational databases, big data sets and more.

    Turn Data Into Actionable Insights

    More than just ETL (Extract, Transform, Load), Pentaho Data Integration is a codeless data orchestration tool that blends diverse data sets into a single source of truth as a basis for analysis and reporting. Effortlessly managed in a drag-and-drop graphical interface, so you can easily track where it's coming from, where it's going and how it's transforming.

    Data Processing Performance and Productivity

    PDI speeds performance time, reduces the complexity of integrating big data sources, and provides:

    • Code-free data transformation
    • Template-based approach to rapidly onboard data sources into Hadoop

    Scalability, Simplicity, and Self-Service

    With broad connectivity to any data type and high-performance Spark and MapReduce execution, PDI simplifies and speeds the process of integrating existing databases with new sources of data.

    • Intuitive, drag-and-drop designer
    • Rich library of prebuilt components
    • Powerful orchestration capabilities

    Integration and Extensibility


    • API Integration: Comprehensive REST and SOAP APIs
    • Plugin Architecture: Extend capabilities with a rich plugin ecosystem
    • Third-Party Tool Integration: BI tools, databases, etc

    Broad Connectivity and Data Delivery

    PDI offers broad connectivity to a variety of diverse data, including structured, unstructured and semi-structured data.

    • Relational database management system (RDBMS): Oracle, IBM DB2, MySQL, Microsoft SQL Server, Postgres, IBM MQ
    • Spark and Hadoop: Cloudera, Hortonworks, Amazon EMR, MapR (HPE Ezmeral Data Fabric), Microsoft Azure HDInsights, and Elastic Search
    • NoSQL databases and object stores: MongoDB, Cassandra, HBase, Hitachi Content Platform, AWS S3, Google Cloud Storage, Microsoft Azure ADLS Gen 2
    • Analytic databases: Redshift, Snowflake, Vertica, Greenplum, Teradata, SAP HANA, Amazon Redshift, Google Big Query
    • Business applications: SAP, Salesforce, Google Analytics
    • Files: XML, JSON, Microsoft Excel, CSV, txt, Avro, Parquet, ORC, EBCDIC (mainframe), unstructured files with metadata, including audio, video and visual files

    Highlights

    • Code-free data transformation design that empowers 15x faster productivity versus hand-coding and executes in-cluster for high performance - Template-based approach to rapidly onboard data sources into Hadoop via metadata injection feature set.
    • Ability to seamlessly switch between execution engines, such as Spark and the PDI native engine, to fit data volume and transformation complexity - Support for advanced analytics models from R, Python, Scala and Weka to operationalize predictive intelligence while reducing data prep time.
    • Robust Dataflow Orchestration of pipeline - Support both structured and unstructured data.

    Details

    Delivery method

    Delivery option
    64-bit (x86) Amazon Machine Image (AMI)

    Latest version

    Operating system
    Ubuntu 20.04 LTS

    Typical total price

    This estimate is based on use of the seller's recommended configuration (m5.large) in the US East (N. Virginia) Region. View pricing details

    $3.666/hour

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Free trial

    Try this product at no cost for 30 days according to the free trial terms set by the vendor. Usage-based pricing is in effect for usage beyond the free trial terms. Your free trial gets automatically converted to a paid subscription when the trial ends, but may be canceled any time before that.

    Pentaho Data Integration

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time. Alternatively, you can pay upfront for a contract, which typically covering your anticipated usage for the contract duration. Any usage beyond contract will incur additional usage-based costs.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Usage costs (5)

     Info
    Instance type
    Product cost/hour
    EC2 cost/hour
    Total/hour
    m5.large
    Recommended
    $3.57
    $0.096
    $3.666
    m5.xlarge
    $7.13
    $0.192
    $7.322
    m5.2xlarge
    $12.49
    $0.384
    $12.874
    m5.4xlarge
    $21.83
    $0.768
    $22.598
    m5.8xlarge
    $36.53
    $1.536
    $38.066

    Additional AWS infrastructure costs

    Type
    Cost
    EBS General Purpose SSD (gp2) volumes
    $0.10/per GB/month of provisioned storage

    Vendor refund policy

    No Refunds

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    64-bit (x86) Amazon Machine Image (AMI)

    Amazon Machine Image (AMI)

    An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.

    Support

    Vendor support

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 AWS reviews
    |
    15 external reviews
    External reviews are sourced from G2  and are not included in the star rating for this product.
    Karthick V.

    Totally worth it!!

    Reviewed on Mar 31, 2022
    Review provided by G2
    What do you like best about the product?
    Best price in market, Hitachi sponsored and high quality in data integration.
    What do you dislike about the product?
    Limitation in features, connector is
    having portability issue and less user friendly.
    What problems is the product solving and how is that benefiting you?
    We used PDI for data integration for designed reports. So far, had the best experience.
    Information Technology and Services

    ETL for Dashboards

    Reviewed on Oct 08, 2020
    Review provided by G2
    What do you like best about the product?
    Pentaho Data Integration (aka Kettle) is a tool included in the Pentaho suite that we use in our Smart Cities projects to obtain data from various data sources. It has a large number of tools already built for Input, Ouput, Transform ... that allow developers to save a lot of time. Its use is easy even for inexperienced users.
    What do you dislike about the product?
    If we want to have support with the Pentaho suite we should not use its Community version (free), but in some Smart Cities specifications of our clients they require a free and open source tool with associated support.
    What problems is the product solving and how is that benefiting you?
    PDI allows us to obtain data from various data sources such as databases, excel files, csv, big data / hadoop type databases and use preconfigured tools so that obtaining this data is simple and parameterizable. Other languages such as python require the writing of complete modules, with PDI the implementation and debugging are integrated through Plug & Play tools.
    Recommendations to others considering the product:
    The Pentaho suite has a Community version that is free and free software, so our recommendation is to download it and test it to verify that this tool meets your requirements. For our part, we recommend it as we use it practically whenever we need to extract data from a data source quickly and easily.
    Information Technology and Services

    ETL with graphical interface

    Reviewed on Jun 10, 2020
    Review provided by G2
    What do you like best about the product?
    Pentaho data integration is one of the most powerful tools for building ETL processes that we use within our Smart Cities projects. It is a tool with a graphical interface that allows you to debug quickly and easily and has a multitude of preconfigured modules. Furthermore, it combines very well with the Hitachi Pentaho CDE tool for the generation of Dashboards.
    What do you dislike about the product?
    When you want to do a very simple development maybe you can choose to use Python source code directly. There are other powerful alternatives like Talend Studio.
    What problems is the product solving and how is that benefiting you?
    Pentaho Data Integration allows us to collect data from different data sources such as both relational and non-relational databases such as Big Data (HDFS), it allows us to bring information from Excel files ... and almost from any source of information we need. Also, their debugging tools save us a lot of time.
    Recommendations to others considering the product:
    Pentaho has a suite called Community that is free and available to everyone. In addition, it has many examples and information. We recommend trying it out before deciding if we need to purchase the paid version. It is a great tool and we recommend it.
    Paco T.

    PDI, best data cleaning tool

    Reviewed on Apr 21, 2020
    Review provided by G2
    What do you like best about the product?
    Pentaho comes in two editions, enterprise and community, I had experience with the community edition and here are all the advatages I see:

    1. Its under apache2.0 license so while you read and work under the agreements, you can have this powerful tool for free
    2. Has a very friendly user interface, so anybody, even without strong programming skill could make some transformations in just minutes
    3. It has a wide variety of data inputs formats, allowing you to read from simple csv's or excels files to databases, json's and even s3 storage
    4. It has a lot of tools for transformating your data without coding
    5. If the functions that PDI has integrated aren't enough for you, you can add some scripting steps
    What do you dislike about the product?
    I see a strong oportunity on improving their documentation, sometimes its kinda hard finding examples for all the functionalities that PDI offers
    What problems is the product solving and how is that benefiting you?
    I mainly use pantaho for transforming data on the ETL cycle, so I do cleansing of different sources and storage it in a DWH
    zahit B.

    Open Source ETL Tools

    Reviewed on Nov 18, 2019
    Review provided by G2
    What do you like best about the product?
    Pentaho Data Integration (PDI) is a free and open source tool for all users.
    Pentaho Data Integration (PDI) is a very high performance product compared to the paid ETL tools. The product is quite simple to use. The components on the left side of the product have all the components that the user needs. (For example; excel connection, row value, etc.) In my experience, the Logging screen is not descriptive. Sometimes you cannot identify the source of the error. Other than that, I am very satisfied with the PDI tool
    What do you dislike about the product?
    Since there are no detailed explanations of the errors on the logging screen, sometimes we cannot find the cause of the error. Also in the user community microsoft, oracle is not as strong.
    What problems is the product solving and how is that benefiting you?
    We needed to import the data from the json file into the tables in the database. With the Pentaho Data Integration tool, we have transferred the json files to the database. We designed daily job with Windows Task Scheduler.
    View all reviews