Overview
Diskover is an open-source data management solution designed to empower users with seamless metadata management, enhanced data analytics, and powerful workflow automation. Built to address the challenges of unstructured data, Diskover provides advanced capabilities for storage visibility, file search, and integration with modern data lakes and lakehouses. It helps unlock actionable insights from unstructured data while ensuring optimal efficiency. Whether tackling complex enterprise data or organizing personal storage, Diskover is your all-in-one solution.
FIND and ANALYZE - Discover and Understand Your Unstructured Data:
- Uncover hidden data with robust file search and comprehensive file indexing tools.
- Organize metadata into a seamless, accessible data catalog for streamlined discovery.
- Gain full storage visibility to monitor usage, identify inefficiencies, and eliminate redundant, obsolete, or trivial (ROT) files.
- Leverage advanced data analytics to visualize data distribution and optimize resource allocation.
- Ensure data hygiene by cleaning up duplicate or irrelevant files, improving data quality across unstructured environments.
ENRICH - Add Value to Your Unstructured Data:
- Enrich datasets with business-context metadata for better integration into reporting, analytics, and business intelligence tools.
- Enhance metadata management with custom tags and standardized metadata structures tailored for unstructured data.
- Simplify data governance by creating centralized rules for access, ownership, and compliance.
- Provide actionable insights to teams, making unstructured data more accessible and usable across your organization.
ORCHESTRATE - Automate Workflows and Optimize Unstructured Data Storage:
- Streamline operations with workflow automation, eliminating repetitive tasks and reducing manual intervention.
- Efficiently move data across storage environments with data mover capabilities for seamless migration.
- Optimize storage policies and significantly reduce costs while maximizing efficiency in managing unstructured data.
- Integrate Diskover into your existing tools and workflows using plugins, APIs, and connectors, ensuring scalability and flexibility.
AI, UNSTRUCTURED DATA, DATA LAKES, AND LAKEHOUSES - Embrace the Future of Data:
- Integrate seamlessly with data lakes and data lakehouses, enabling advanced unstructured data discovery and management.
- Enrich datasets for AI and machine learning, enhancing analytics pipelines and decision-making.
- Leverage modern data architectures with metadata-driven insights that make unstructured data organized, searchable, and valuable.
- Drive innovation with AI-ready datasets optimized for faster analysis, operational excellence, and informed decisions.
Diskover combines the power of open-source flexibility with cutting-edge tools for managing unstructured data. Its advanced capabilities for file search, storage visibility, and workflow automation make it essential for individuals and enterprises alike. Whether managing data lakes, optimizing unstructured storage, or integrating AI, Diskover is the key to unlocking the full potential of your data.
Highlights
- Powerful Data Management and Analysis: Streamline metadata management, gain complete storage visibility, and analyze your data with advanced file search, indexing, and actionable insights.
- AI-Ready for Modern Workflows: Seamlessly integrate with AI pipelines, data lakes, and data lakehouses, and send relevant data to LLMs (Large Language Models) for advanced analytics and smarter decision-making.
- Orchestration for Sustainability: Leverage the flexibility of open source to automate workflows, orchestrate data curation, and improve data hygiene, all while driving sustainable and efficient data management practices.
Details
Features and programs
Financing for AWS Marketplace purchases
Pricing
Instance type | Product cost/hour | EC2 cost/hour | Total/hour |
---|---|---|---|
m5.xlarge | $0.00 | $0.192 | $0.192 |
m5.2xlarge Recommended | $0.00 | $0.384 | $0.384 |
Additional AWS infrastructure costs
Type | Cost |
---|---|
EBS General Purpose SSD (gp2) volumes | $0.10/per GB/month of provisioned storage |
Vendor refund policy
No refund
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
Additional details
Usage instructions
We recommend using a 500GB SSD EBS storage for m5.xlarge instances and 1TB for m5.2xlarge.
Installation instructions: https://github.com/diskoverdata/diskover-community/blob/master/INSTALL.md
Once your instance is deployed, connect on your instance IP on port 8000. The default credentials are as follows, and you will be prompted to change your password upon your first login:
- Username: diskover
- Password: darkdata
To create your first index, please connect to your instance using SSH with rocky user and your keypair: $> ssh -i <your-keypair-path> rocky@<your-instance-ip>
To run your first indexing, run: $> cd /opt/diskover $> python3 diskover.py -i diskover-<indexname> <storage_top_dir>
For further information, please read the documentation at https://docs.diskoverdata.com/ .
Resources
Vendor resources
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.