Overview
HCLTech’s Intelligent Ingestion solution demonstrates on how to define end-to-end ETL workflow that can be triggered with a AWS Step Function job which will internally invoke multiple different state machines in parallel for a seamless ETL experience providing reliable parallel data ingestion to data curation, aggregation until consumption in most efficient and cost-effective manner. Entire solution is parameterized to ensure no-code/low-code experience, which will enable users to add any desired data source by just passing the source data details in a standardized format and let the solution build ETL Ingestion jobs on the fly for new added data sources.
-
HCL’s Intelligent Ingestion solution can start the execution based on the auto event trigger mechanism or by scheduler.
-
Main state machine will start the process by invoking crawler state machine which will internally execute the glue crawlers to connect, scan data from various sources, such as RDBMS, edge devices, logs, batch data and simultaneously create metadata in centralized AWS Glue Data Catalog. A single glue job will ingest data in parallel to Amazon S3 raw zone for different databases/tables.
-
Once the data ingestion is completed successfully, then an email alert will be sent to the user via Amazon SNS service and data quality validation will be auto-executed by another separate state machine which is plugged into the main state machine.
-
Data Quality validation will be performed by AWS Glue DataBrew jobs. Based on the predefined configured threshold, the pass or fail results will be published
-
Upon data quality validation failure, AWS lambda will create high severity ServiceNow ticket and an email alert will be sent to the respective team for further troubleshooting. Incase data quality validation is passed, then a successful email alert will be triggered to the respective team and the solution will execute the next curation state machine.
-
Curation state machine will internally execute a single AWS Glue curation job which will executed in parallel, and transformations will be executed for respective data sources.
-
If the curation job is executed successfully, an email alert will be sent to the respective team by Amazon SNS service and the curated data will be stored into the Amazon S3 curation zone.
-
Once the curation state machine completes its execution successfully, then the final enrichment state machine will execute and internally a single glue job will execute to perform the aggregations in parallel for respective data sources.
-
Upon completion of all jobs configured in Intelligent Ingestion solution, final success email will be triggered to the respective teams and the aggregated data will be stored into the Amazon S3 enrichment zone.
-
Each state machines are lightly coupled enabling business to easily plug in or plug out any specific state machine from the main state machine.
-
The AWS Glue Data Catalog and Amazon S3 are governed by AWS Lake formation and access privileges will be provided based on the relevant IAM groups or IAM roles.
This solution comprises of the following key pillars, each having several interesting features that are crucial for building a robust and complete ingestion solution:
- Schema Evolution
- Fully Automated ETL Pipeline
- Re-Usable ETL Workflow & Transformation
- Centralized Data Catalog-
- Data Quality & Governance-
- Dynamic Alert Mechanism,Incident Reporting & Management:
Highlights
- Intelligent Ingestion- solution uses one single job to trigger the entire ETL flow including data ingestion, data quality, curation and enrichment.
- It also encourages standard parametrized approach for handling all data types and sources, with provision to handle concurrent ETL job executions making it a fully scalable solution to manage bulk ingestion seamlessly.
- It separates the transformation logic from the ETL process making it re-usable, scalable and maintainable.
Details
Pricing
Custom pricing options
Legal
Content disclaimer
Support
Vendor support
Please contact us at digitaltransformation@hcl.com with our solution which you are interested to know more on deployment and our support.