Overview
For Private Offer Pricing, please contact:
PrivateOfferPricing@pentaho.com
Datasheet:Pentaho Data Catalog
Pentaho Data Catalog Helps Business Users Search and Understand
Structured and Unstructured Data Everywhere
Without a library catalog, people cannot discover the book they need. Without a data catalog, people cannot discover the data they need. With Pentaho Data Catalog, you can see all the data you have, whatever form it takes, wherever it sits, check it, classify it and make it available to users.
Get Faster and More Meaningful Data to Users
A modern organization must be data fit. As data grows, so does the need and cost of maintaining the data in business-ready shape. To leverage data for business decisions and enable AI, data must be trusted, high quality and seamlessly available to the data users. Now more than ever, there is a need to discover content across structured and unstructured, on-prem and cloud. Organizations must monitor their data to spot trends and anomalies and maintain data hygiene at the speed of data growth. Policies for governing, life cycle and quality need to be enforced to ensure appropriate high-quality data is available to consumers. Data users and models can easily find and use data via the data catalog, a necessity for modern data-drive organizations.
Powerful Business Glossary Using Machine Learning
Pentaho Data Catalog (PDC) rapidly ingests, profiles and curates structured and unstructured data with both automation and machine learning. Fingerprinting of data and metadata rules are used to contextualize data with the language of the business documented in the business glossary. Its policy manager enables the implementation of governance and security policies.
A powerful rules engine helps determine quality, sensitivity, and usage patterns. Activate your metadata by leveraging monitoring and notification capabilities in the product. Construct a relationship graph across business entities and terms to add semantic understanding to data.
Data fingerprints are analyzed to determine potential duplicates, copies and similarities across data stores to assess data movement, optimization and mastering needs. Data lineage support for Open Lineage provides the ability to track data as it flows through your organization, building trust and enabling a left shift of data quality and remediation activities.
Understand Data
Automatically find, analyze and tag structured and unstructured data across. Contextualize with business glossary and governance policies.
Activate Metadata
Observe data to define measures for data over time. Monitor metadata and act upon changes, trends and anomalies in data. Leverage event-driven architecture to apply remediation before any impact is noticed downstream.
Data-Fit for AI
Trusted and high-quality data is made available to decision makers with a shopping experience. Catalog users deliver data to the desired destination with a No Code Data Pipe build experience.
Optimization and Compliance
Measure data utilization, value and aging to make optimized storage decisions. Automated classification and characterization enable the application of life cycle, governance and access policies.
Manage All Your Data
Manage structured and unstructured data connected to multiple and disparate data stores, such as RDBMS Systems, File Systems (NFS, HDFS, SMB) and Object Stores.
Governance for the Enterprise
Bring business and governance vocabulary, policies and standards and their application to data and applications and reports. Determine lineage and usage.
Feature Rich
Bring in reference data, usage characteristics, view semantic relationships, customize properties.
Enterprise Scale
Modern architecture is designed to scale with your data at petabyte scale – without affecting business or systems.
Make Better Business Decisions with Better Data
Datawith its full context, informed of its characteristics, qualified for accuracy, sensitivity, freshness helps with correctness of business decisions.
Flexibility through Modular Extensibility
Choose the applications you need and build from there – with modules for privacy, security and governance.
Highlights
- Capture metadata from data sources: PDC can capture metadata for structured and unstructured data, which can be used to build a business glossary to provide business context to data.
- Stewardship workbench: PDC provides a stewardship workbench to curate and augment captured metadata. This feature helps capture classification, usage, lineage and apply policies to data assets.
- Build Data Market Place: PDC provides a single point of entry for the organization to capture business vocabulary, terms with descriptive information to align towards better communication. It also classifies data to determine business value as well as risk association.
Details
Typical total price
$109.709/hour
Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Instance type | Product cost/hour | EC2 cost/hour | Total/hour |
---|---|---|---|
m5.4xlarge | $108.173 | $0.768 | $108.941 |
m5.8xlarge Recommended | $108.173 | $1.536 | $109.709 |
Additional AWS infrastructure costs
Type | Cost |
---|---|
EBS General Purpose SSD (gp3) volumes | $0.08/per GB/month of provisioned storage |
Vendor refund policy
We do not currently support refunds, but you can cancel at any time.
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
Additional details
Usage instructions
Documentation: https://docs.hitachivantara.com/r/en-us/pentaho-data-catalog/10.1.x/mk-95pdc000 Documentation - Get started: https://docs.hitachivantara.com/r/en-us/pentaho-data-catalog/10.1.x/mk-95pdc001 Documentation - Administer: https://docs.hitachivantara.com/r/en-us/pentaho-data-catalog/10.1.x/mk-95pdc002 Documentation - Hyperscalers: https://docs.hitachivantara.com/r/en-us/pentaho-data-catalog/10.1.x/mk-95pdc001/hyperscalers
Resources
Vendor resources
Support
Vendor support
Customer Care Technical Support (800) 446-0744. Our Global Support Center is available by phone 24 hours each day, 7 days per week. If your product is maintained by a Hitachi Maintenance and Support Partner please contact them for support, based upon the contractual agreement.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products
Customer reviews
Totally worth it!!
having portability issue and less user friendly.
ETL for Dashboards
ETL with graphical interface
PDI, best data cleaning tool
1. Its under apache2.0 license so while you read and work under the agreements, you can have this powerful tool for free
2. Has a very friendly user interface, so anybody, even without strong programming skill could make some transformations in just minutes
3. It has a wide variety of data inputs formats, allowing you to read from simple csv's or excels files to databases, json's and even s3 storage
4. It has a lot of tools for transformating your data without coding
5. If the functions that PDI has integrated aren't enough for you, you can add some scripting steps
Open Source ETL Tools
Pentaho Data Integration (PDI) is a very high performance product compared to the paid ETL tools. The product is quite simple to use. The components on the left side of the product have all the components that the user needs. (For example; excel connection, row value, etc.) In my experience, the Logging screen is not descriptive. Sometimes you cannot identify the source of the error. Other than that, I am very satisfied with the PDI tool