Unlock sustainability insights by converting public utility invoices into machine data
This Guidance demonstrates how organizations can derive sustainability insights by converting utility invoices into machine-readable data using an automated pipeline and generative AI technology. By automating the extraction of key data points from any utility invoice, organizations can normalize the data and gain insights into their emissions usage. This addresses the challenge businesses face in emissions footprint reporting and insights based on their real estate portfolio, given the lack of programmatic access to invoice data.
Architecture Diagram
[Architecture diagram description]
Step 1
Any ingestion mechanism can be used to move invoice documents into Amazon Simple Storage Service (Amazon S3) for processing, such as Amazon API Gateway, Amazon Simple Email Service (Amazon SES), or the AWS SDKs.
Step 2
Invoice documents arrive in an Amazon S3 bucket, trigger event notifications in Amazon EventBridge, and start a processing workflow in AWS Step Functions.
Step 3
AWS Lambda converts the PDF document into images and saves them to Amazon S3.
Step 4
The images are combined with a text-based prompt. The full prompt is saved to Amazon S3.
Step 5
Amazon Bedrock is called directly from Step Functions using an optimized integration. The prompt instructs the foundation model to interpret the image and generate a standard structured JSON output.
Step 6
Standardized utility invoice data is stored in an output Amazon S3 bucket to integrate with downstream analytics capabilities such as Amazon QuickSight or with Data Lakes on AWS.
Step 7
An event is published through EventBridge. This event can be used to notify end users or other systems that processing has completed.
Get Started
Deploy this Guidance
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
This Guidance uses standard service metrics to monitor the health of individual pipeline components, such as Lambda function concurrency or error rate. In addition, Amazon CloudWatch metrics, alarms, and dashboards can be customized to monitor the operational health of this Guidance and notify operators of any faults.
-
Security
The resources deployed through this Guidance are safeguarded through the policies and principles of AWS Identity and Access Management (IAM). Least-privilege access and role-based access controls should be used to grant operators the necessary permissions to modify resources, such as deploying an updated stack through AWS CloudFormation.
-
Reliability
This Guidance uses infrastructure-as-code in an AWS Cloud Development Kit (CDK). These CDK stacks are deployed through CloudFormation for resilient change management that will automatically rollback if a fault is detected during deployment. In addition, by using loosely coupled dependencies like EventBridge rules, this Guidance can handle ingestion events from Amazon S3 and implement retries on downstream quota limits. These services enhance reliability through a decoupled, scalable, and serverless architecture, allowing for automatic scaling, reliable event processing, reduced operational overhead, and consistent, repeatable deployments through infrastructure-as-code.
-
Performance Efficiency
This Guidance is designed to scale in order to meet the processing requirements for utility invoices. It employs a queueing mechanism to regulate the rate at which invoices are processed. For customers with large, consistent inference workloads, they can request an increase to the Lambda concurrency limit and use the provisioned throughput model of Amazon Bedrock so that the application's performance needs are adequately addressed.
-
Cost Optimization
The selection of serverless technologies that can be configured with this Guidance reduces costs that are directly correlated to the number of invoices processed. For the storage of binary invoice documents, this Guidance uses the Amazon Simple Storage Service Intelligent-Tiering storage class or Amazon S3 Lifecycle configuration policies. These policies can lower the long-term storage costs or eliminate the long-term storage of documents entirely.
-
Sustainability
This Guidance uses Amazon S3 to store invoices, which represent the largest data type within the application. AWS customers have the ability to make minor adjustments to achieve their ideal data storage configuration by using Amazon S3 Intelligent Tiering or Amazon S3 Lifecycle policies, as outlined in the Cost Optimization section. Amazon S3 enables the optimization of data storage through energy-efficient tiers while also reducing the carbon footprint through the shared infrastructure and renewable energy usage.
Related Content
[Title]
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.