Listing Thumbnail

    Pentaho Data Catalog

     Info
    Free Trial
    Automated Discovery, Classification, and Optimization - Know Your Data. Trust Your Data. Build Your Data Products.
    Listing Thumbnail

    Pentaho Data Catalog

     Info

    Overview

    Play video

    For Private Offer Pricing, please contact:
    PrivateOfferPricing@pentaho.com

    Datasheet:Pentaho Data Catalog 


    Pentaho Data Catalog Helps Business Users Search and Understand
    Structured and Unstructured Data Everywhere

    Without a library catalog, people cannot discover the book they need. Without a data catalog, people cannot discover the data they need. With Pentaho Data Catalog, you can see all the data you have, whatever form it takes, wherever it sits, check it, classify it and make it available to users.

    Get Faster and More Meaningful Data to Users

    A modern organization must be data fit. As data grows, so does the need and cost of maintaining the data in business-ready shape. To leverage data for business decisions and enable AI, data must be trusted, high quality and seamlessly available to the data users. Now more than ever, there is a need to discover content across structured and unstructured, on-prem and cloud. Organizations must monitor their data to spot trends and anomalies and maintain data hygiene at the speed of data growth. Policies for governing, life cycle and quality need to be enforced to ensure appropriate high-quality data is available to consumers. Data users and models can easily find and use data via the data catalog, a necessity for modern data-drive organizations.

    Powerful Business Glossary Using Machine Learning

    Pentaho Data Catalog (PDC) rapidly ingests, profiles and curates structured and unstructured data with both automation and machine learning. Fingerprinting of data and metadata rules are used to contextualize data with the language of the business documented in the business glossary. Its policy manager enables the implementation of governance and security policies.

    A powerful rules engine helps determine quality, sensitivity, and usage patterns. Activate your metadata by leveraging monitoring and notification capabilities in the product. Construct a relationship graph across business entities and terms to add semantic understanding to data.

    Data fingerprints are analyzed to determine potential duplicates, copies and similarities across data stores to assess data movement, optimization and mastering needs. Data lineage support for Open Lineage provides the ability to track data as it flows through your organization, building trust and enabling a left shift of data quality and remediation activities.

    Understand Data

    Automatically find, analyze and tag structured and unstructured data across. Contextualize with business glossary and governance policies.

    Activate Metadata

    Observe data to define measures for data over time. Monitor metadata and act upon changes, trends and anomalies in data. Leverage event-driven architecture to apply remediation before any impact is noticed downstream.

    Data-Fit for AI

    Trusted and high-quality data is made available to decision makers with a shopping experience. Catalog users deliver data to the desired destination with a No Code Data Pipe build experience.

    Optimization and Compliance

    Measure data utilization, value and aging to make optimized storage decisions. Automated classification and characterization enable the application of life cycle, governance and access policies.

    Manage All Your Data

    Manage structured and unstructured data connected to multiple and disparate data stores, such as RDBMS Systems, File Systems (NFS, HDFS, SMB) and Object Stores.

    Governance for the Enterprise

    Bring business and governance vocabulary, policies and standards and their application to data and applications and reports. Determine lineage and usage.

    Feature Rich

    Bring in reference data, usage characteristics, view semantic relationships, customize properties.

    Enterprise Scale

    Modern architecture is designed to scale with your data at petabyte scale – without affecting business or systems.

    Make Better Business Decisions with Better Data

    Datawith its full context, informed of its characteristics, qualified for accuracy, sensitivity, freshness helps with correctness of business decisions.

    Flexibility through Modular Extensibility

    Choose the applications you need and build from there – with modules for privacy, security and governance.

    Highlights

    • Capture metadata from data sources: PDC can capture metadata for structured and unstructured data, which can be used to build a business glossary to provide business context to data.
    • Stewardship workbench: PDC provides a stewardship workbench to curate and augment captured metadata. This feature helps capture classification, usage, lineage and apply policies to data assets.
    • Build Data Market Place: PDC provides a single point of entry for the organization to capture business vocabulary, terms with descriptive information to align towards better communication. It also classifies data to determine business value as well as risk association.

    Details

    Delivery method

    Delivery option
    64-bit (x86) Amazon Machine Image (AMI)

    Latest version

    Operating system
    CentOs Linux 7 x86_64 - 2211

    Typical total price

    This estimate is based on use of the seller's recommended configuration (m5.8xlarge) in the US East (N. Virginia) Region. View pricing details

    $109.709/hour

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Free trial

    Try this product at no cost for 31 days according to the free trial terms set by the vendor. Usage-based pricing is in effect for usage beyond the free trial terms. Your free trial gets automatically converted to a paid subscription when the trial ends, but may be canceled any time before that.

    Pentaho Data Catalog

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time. Alternatively, you can pay upfront for a contract, which typically covering your anticipated usage for the contract duration. Any usage beyond contract will incur additional usage-based costs.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Usage costs (2)

     Info
    Instance type
    Product cost/hour
    EC2 cost/hour
    Total/hour
    m5.4xlarge
    $108.173
    $0.768
    $108.941
    m5.8xlarge
    Recommended
    $108.173
    $1.536
    $109.709

    Additional AWS infrastructure costs

    Type
    Cost
    EBS General Purpose SSD (gp3) volumes
    $0.08/per GB/month of provisioned storage

    Vendor refund policy

    We do not currently support refunds, but you can cancel at any time.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    64-bit (x86) Amazon Machine Image (AMI)

    Amazon Machine Image (AMI)

    An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.

    Support

    Vendor support

    Customer Care Technical Support (800) 446-0744. Our Global Support Center is available by phone 24 hours each day, 7 days per week. If your product is maintained by a Hitachi Maintenance and Support Partner please contact them for support, based upon the contractual agreement.

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 AWS reviews
    |
    15 external reviews
    External reviews are sourced from G2  and are not included in the star rating for this product.
    Karthick V.

    Totally worth it!!

    Reviewed on Mar 31, 2022
    Review provided by G2
    What do you like best about the product?
    Best price in market, Hitachi sponsored and high quality in data integration.
    What do you dislike about the product?
    Limitation in features, connector is
    having portability issue and less user friendly.
    What problems is the product solving and how is that benefiting you?
    We used PDI for data integration for designed reports. So far, had the best experience.
    Information Technology and Services

    ETL for Dashboards

    Reviewed on Oct 08, 2020
    Review provided by G2
    What do you like best about the product?
    Pentaho Data Integration (aka Kettle) is a tool included in the Pentaho suite that we use in our Smart Cities projects to obtain data from various data sources. It has a large number of tools already built for Input, Ouput, Transform ... that allow developers to save a lot of time. Its use is easy even for inexperienced users.
    What do you dislike about the product?
    If we want to have support with the Pentaho suite we should not use its Community version (free), but in some Smart Cities specifications of our clients they require a free and open source tool with associated support.
    What problems is the product solving and how is that benefiting you?
    PDI allows us to obtain data from various data sources such as databases, excel files, csv, big data / hadoop type databases and use preconfigured tools so that obtaining this data is simple and parameterizable. Other languages such as python require the writing of complete modules, with PDI the implementation and debugging are integrated through Plug & Play tools.
    Recommendations to others considering the product:
    The Pentaho suite has a Community version that is free and free software, so our recommendation is to download it and test it to verify that this tool meets your requirements. For our part, we recommend it as we use it practically whenever we need to extract data from a data source quickly and easily.
    Information Technology and Services

    ETL with graphical interface

    Reviewed on Jun 10, 2020
    Review provided by G2
    What do you like best about the product?
    Pentaho data integration is one of the most powerful tools for building ETL processes that we use within our Smart Cities projects. It is a tool with a graphical interface that allows you to debug quickly and easily and has a multitude of preconfigured modules. Furthermore, it combines very well with the Hitachi Pentaho CDE tool for the generation of Dashboards.
    What do you dislike about the product?
    When you want to do a very simple development maybe you can choose to use Python source code directly. There are other powerful alternatives like Talend Studio.
    What problems is the product solving and how is that benefiting you?
    Pentaho Data Integration allows us to collect data from different data sources such as both relational and non-relational databases such as Big Data (HDFS), it allows us to bring information from Excel files ... and almost from any source of information we need. Also, their debugging tools save us a lot of time.
    Recommendations to others considering the product:
    Pentaho has a suite called Community that is free and available to everyone. In addition, it has many examples and information. We recommend trying it out before deciding if we need to purchase the paid version. It is a great tool and we recommend it.
    Paco T.

    PDI, best data cleaning tool

    Reviewed on Apr 21, 2020
    Review provided by G2
    What do you like best about the product?
    Pentaho comes in two editions, enterprise and community, I had experience with the community edition and here are all the advatages I see:

    1. Its under apache2.0 license so while you read and work under the agreements, you can have this powerful tool for free
    2. Has a very friendly user interface, so anybody, even without strong programming skill could make some transformations in just minutes
    3. It has a wide variety of data inputs formats, allowing you to read from simple csv's or excels files to databases, json's and even s3 storage
    4. It has a lot of tools for transformating your data without coding
    5. If the functions that PDI has integrated aren't enough for you, you can add some scripting steps
    What do you dislike about the product?
    I see a strong oportunity on improving their documentation, sometimes its kinda hard finding examples for all the functionalities that PDI offers
    What problems is the product solving and how is that benefiting you?
    I mainly use pantaho for transforming data on the ETL cycle, so I do cleansing of different sources and storage it in a DWH
    zahit B.

    Open Source ETL Tools

    Reviewed on Nov 18, 2019
    Review provided by G2
    What do you like best about the product?
    Pentaho Data Integration (PDI) is a free and open source tool for all users.
    Pentaho Data Integration (PDI) is a very high performance product compared to the paid ETL tools. The product is quite simple to use. The components on the left side of the product have all the components that the user needs. (For example; excel connection, row value, etc.) In my experience, the Logging screen is not descriptive. Sometimes you cannot identify the source of the error. Other than that, I am very satisfied with the PDI tool
    What do you dislike about the product?
    Since there are no detailed explanations of the errors on the logging screen, sometimes we cannot find the cause of the error. Also in the user community microsoft, oracle is not as strong.
    What problems is the product solving and how is that benefiting you?
    We needed to import the data from the json file into the tables in the database. With the Pentaho Data Integration tool, we have transferred the json files to the database. We designed daily job with Windows Task Scheduler.
    View all reviews