Overview
Consistently aggregates disparate files (JSON, CSV, XML) securely and accurately into a data lake for analytics. Enables better data-driven decisions across financial reports, customer trends, supply chain and other functions. Ensures data security, quality and consistency.
Typical POC Length: 7 days
Up to 3 JSON, CSV, or XML files (mix and match) Data Lake (S3) - with three areas: ingestion, organized, analysis Change Data Capture (S3 Event Notifications, Lambda Functions) - incremental, cumulative, cumultaive-ytd, cumulative-mtd CDC process that automatically updates "organized" section of the data lake Data is stored in parquet format for better performance and lower cost Transformation (Glue Job, Python Shell) Data aggregation and analysis Transformed data can be explored/queried using Athena Publishing Transformed data is pushed to RDS or Aurora
Technologies: Amazon S3, AWS Lambda, AWS Glue, Python, Amazon Athena, Amazon RDS or Amazon Aurora
Use Cases: Drug Use & Health Analytics Aggregate, store and process very large datasets from different agencies within a country government to study the effects of Opioid use and understand how Human Services can better serve the citizens.
Government Services Ingest, cleanse, aggregate, analyze, publish, and present data from 26+ government agencies, with various data formats, in order to understand how citizens use the services provided by the county
Highlights
- Quickly and accurately aggregate data for analytics uses
- Ensure data quality and security
- Typical POC in 7 days or less