Overview
Key Features
-
Multi-Source Data Integration: Supports relational databases (Oracle, MySQL, PostgreSQL, SQL Server) and file-based data (CSV, TSV, etc.).
-
Machine Learning Model: Utilizes ML algorithms to analyze metadata and identify table relationships.
-
Business Categorization: Determines business relevance of table columns using automated classification techniques.
-
Validation GUI: Clients can review and validate metadata via an interactive GUI.
-
Metadata Storage: Captures and stores metadata in a MySQL database for further analysis.
-
Synthetic Data Generation: Use multiple libraries to generate synthetic datasets for testing and development.
-
Cloud Storage Integration: Exports processed and synthetic data to cloud-based storage (AWS S3, Blob storage, etc.).
Development Options
• Programming Language: Python
• Data Processing SDKs: Microsoft Presidio (for sensitive data processing and anonymization)
• Database Systems: MySQL, PostgreSQL, Oracle, SQL Server
• GUI Development: Client-side web application
• Synthetic Data Libraries: Faker, Mimesis
• Cloud Storage: AWS S3, Blob Storage
AWS Tools for Logging & Monitoring
• Amazon CloudWatch: Monitors logs, metrics, and alerts for application health tracking.
• AWS Lambda Logging: Logs function executions and errors for debugging.
• Amazon S3 Logging: Tracks access and modification history of stored data.
• AWS IAM Policies: Ensures secure access control for data storage and processing services.
Key Benefits
• Automated Metadata Processing: Reduces manual effort in analyzing database schema and relationships.