AWS Public Sector Blog

AWS AI agents automate citation validation and formatting to save time and instill confidence

AWS branded background with text "AWS AI agents automate citation validation and formatting to save time and instill confidence"

Any organization that receives proposals or issues publications understands the importance of checking the accuracy and validity of cited sources in those documents. Proposals hinge on the accuracy of this information, yet significant time can be consumed verifying the source’s existence, accuracy, and relevance to the topic. The same can be said about validating scientific article submissions and reviewing student papers, theses, and dissertations. Verifying each type of reference necessitates a systematic approach to review. Furthermore, although it takes a tremendous amount of time and can be tedious, verification of cited work is crucial to instill confidence in the information presented in these documents. Now you can save time and instill confidence in these documents more quickly by using Amazon Quick Automate, which creates a set of agents to complete those tasks.

The challenge

Today’s generative AI landscape presents a unique challenge concerning citation validation. Generative AI models can produce plausible-sounding content that may be inaccurate, biased, or even entirely fabricated (referred to as “hallucinations”). Therefore, it is critical to validate any information obtained from these tools, such as citations.

Here’s why citation validation is essential when using generative AI:

  • Risk of fabricated citations: Generative AI tools have been shown to create completely fake citations, sometimes even including real author names or publication venues to make them appear more convincing.
  • Inaccurate information in valid sources: Even if the AI cites a real source, the information attributed to that source might be misrepresented or incorrect.
  • Outdated information: Some AI models may not have access to the most up-to-date information, potentially leading to references to outdated sources.
  • Bias in AI-generated content: AI models can reflect biases present in their training data, potentially leading to biased information or the citation of biased sources.
  • Plausible-sounding, yet untrue statements: AI can generate statements that sound convincing but are factually incorrect.

Introducing Amazon Quick Automate

Quick Automate ushers in a new era of enterprise automation by combining the power of generative AI with the comprehensive cloud capabilities of Amazon Web Services (AWS) to transform how organizations automate their business processes. Quick Automate represents a significant leap forward in enterprise automation by offering:

  • AI-powered workflow creation: Organizations can now describe their automation needs using natural language, videos, or documentation. The service’s advanced AI analyzes these inputs and automatically generates detailed workflow plans that can be refined and implemented.
  • Adaptive user interface (UI) automation: Using a sophisticated Large Action Model (LAM), the service can intelligently interact with any user interface, adapting in real-time to changes in layouts or unexpected scenarios. This dramatically reduces maintenance overhead and improves reliability.
  • Unified automation platform: Unlike traditional solutions that need multiple tools, Quick Automate provides a single, comprehensive platform that combines UI automation, API integrations, and workflow orchestration to streamline management and reduces costs.

The service introduces several transformative capabilities:

  • Intelligent planning: A proprietary multi-modal large language model (LLM) analyzes process requirements and automatically generates optimized workflow plans that can be modified to create automation following best practices.
  • Dynamic execution: Advanced AI agents, such as the browser agent and integrated Amazon Bedrock agents, can handle complex decision-making and adapt to changing conditions during workflow execution. These agents can handle complex decision-making and adapt to changing conditions during workflow execution.
  • Smart human integration: Seamless human-in-the-loop capabilities provide appropriate oversight while maximizing automation efficiency.
  • Enterprise-grade governance: Built-in controls provide proper access management and data security across all automated processes.

Combining these capabilities with the serverless infrastructure and pay-per-use pricing model of AWS, Quick Automate makes sophisticated automation accessible to organizations of all sizes while significantly reducing total cost of ownership.

Powered by Amazon Bedrock in-line and UI agents, Quick Automate creates intelligent agentic automation that performs multiple validation tasks in parallel, integrating with external academic databases and search engines, while maintaining citation compliance standards. As a result, organizations can automate high volumes of transactions and complex workflows that span multiple applications and systems while maintaining reliability and accuracy.

Example solution

The following use case leverages Quick Automate to validate citations by checking the American Psychological Association (APA) format compliance, verifying citation links, and extracting metadata from academic papers. The system uses Amazon Bedrock for AI-powered validation and implements human-in-the-loop verification for quality assurance.

Solution architecture

The following figure shows the solution architecture.

Figure 1. Architectural diagram of the Citation Validation Solution described in this post. The major components are Secure File Transfer Protocol (SFTP) connections, AWS Transfer Family, Amazon Simple Email Service (Amazon SES), Amazon S3, Amazon Bedrock, Amazon Quick Automate, Amazon Comprehend, Amazon Textract, LAMs, and Amazon Bedrock LLMs

Key workflow steps

  1. Document ingestion
    The citation validation process begins with a flexible ingestion system that accommodates multiple input channels. Organizations can upload citations through Secure File Transfer Protocol (SFTP) connections using AWS Transfer Family, while individual providers can submit citations directly via email using Amazon Simple Email Service (Amazon SES). All incoming documents are automatically routed to a designated Amazon Simple Storage Service  (Amazon S3) bucket, where they undergo initial processing to extract individual citations and create structured records for validation.
  1. AI-powered APA format validation
    AWS uses the diverse set foundation models (FMs) in Amazon Bedrock to perform sophisticated APA format validation. In this use case, a Claude 3 Sonnet model was chosen. However, users have the flexibility to choose from a wide range of models depending on their specific use case. The system uses the model to automatically analyze each citation’s structure, generating detailed compliance scores and identifying potential formatting issues. When citations don’t meet the necessary confidence threshold, the system seamlessly integrates with human-in-the-loop functionality, creating targeted review tasks for expert validation. When the human reviews, confirms, and provides the recommendation back asynchronously, Quick Automate completes the remaining steps.
  1. Citation link verification
    The browser agent in Quick Automate—powered by Amazon ComprehendAmazon Textract, LAMs, and Amazon Bedrock LLMs—implements a robust, multi-layered approach to link verification. The system first attempts to validate citations through their provided URLs. In this solution we opted for PubMed as a secondary validation source. If the direct link is inaccessible, then the solution automatically searches PubMed’s public website for validation using the agent’s reasoning capabilities. Upon successful navigation to either source, the system methodically extracts the following essential bibliographic data:
    • Journal name
    • Volume information
    • Publication date
    • Page numbers
    • Article titles
    • Author names
  1. Data validation
    The data verification process represents a sophisticated application of AI-powered text analysis powered by the Amazon Bedrock LLM models to validate data (for example, author identities) across citations. At its core, the system employs intelligent pattern recognition to handle the complexities of academic representations from abbreviated formats (for example, full names to abbreviated names). The verification engine applies a series of matching algorithms that look beyond surface-level text, considering common variations in presentations while maintaining accuracy (for example, name identification).
  1. Data processing
    The final stage of the workflow brings together all validation outcomes into a comprehensive results management system. The solution consolidates findings from each validation checkpoint, such as format compliance scores, link verification status, and author matching results, into a structured output format. These results are securely stored in Amazon S3, which maintains a complete audit trail of all validation activities. The system automatically generates notifications for relevant stakeholders, making sure of transparent and efficient communication of validation outcomes.
  1. Governance
    Quick Automate implements a multi-layered governance framework that enforces strict access controls and process oversight. The automation uses  AWS Identity and Access Management (IAM) roles with least-privilege principles, restricting Amazon S3 bucket access to specific paths within the bucket for both input retrieval and output storage. The Amazon Bedrock interactions are controlled through dedicated service roles, limiting access to specific models and predetermined prompt patterns. IAM policies govern the browser agent’s activities and define permissible navigation patterns and data extraction boundaries. Human-in-the-loop tasks are managed through a role-based queue system, which tracks task assignments and resolutions with clear ownership and audit trails. The workflow maintains segregation of duties between teams using automation groups.

Benefits of using Quick Automate for citation validation and formatting

Quick Automate provides end-to-end automation support through both agentic (AI-powered, adaptive) and deterministic (rule-based, predictable) approaches. This makes sure that the right automation method is applied to each task within a workflow.

The following are four aspects of the end-to-end process automation and orchestration:

  • Authoring studio
    This features an intuitive interface where users can create and manage workflows using AI-powered planning agents. The studio facilitates natural language inputs and collaborative workflow development, streamlining the automation creation process by reducing time to build intelligent automations.
  • Fully managed service
    This operates as a serverless, AWS-managed service, eliminating the need for infrastructure management and maintenance while making sure of optimal performance and reliability.
  • Governance and responsible AI
    This implements enterprise-grade security controls with comprehensive role management, governance frameworks, and complete audit trails. It provides responsible AI usage with built-in safeguards and monitoring capabilities.
  • Consumption-based pricing
    This uses a flexible pricing model tied directly to process execution, which allows organizations to pay only for what they use without upfront commitments or fixed costs.

Proof of concept

The following screenshots showcase the solution in Quick Automate with input and output.

Figure 2. Input – List of citations and S3 bucket

Figure 3. Automation project – Project page in Quick Automate

Figure 4. Automation project – Authoring canvas (Low code/no code) and chat assistant

Figure 5. Execution – UI streaming, logs and audit trail, human-in-the-loop task creation

Figure 6. Execution – Agent thinking in audit trail

Figure 7. Human-in-the-loop task forms for manual verification and transparency

Figure 8. Output with status and comments

Curious to learn more?

Learn more about building game-changing citation validation and formatting agentic AI on AWS. To get access to a cutting edge private beta of Amazon Q Automation, email mballal@amazon.com and request access.

Learn more about AWS solutions for healthcare and life sciences.

Kushal Bhattacharya

Kushal Bhattacharya

As an automation specialist on the AWS Agentic AI Product team, Kushal focuses on transforming business processes using cutting-edge RPA and AI technologies. He brings more than a decade of analytics experience, complemented by his MBA and science degree, which allows him to bridge the gap between technical solutions and business strategy. Drawing on his extensive knowledge of AWS technologies and generative AI solutions, Kushal implements smart automation systems that prioritize both customer success and process efficiency, consistently driving operational excellence through the latest technological innovations.

Dr. Dawn Heisey-Grove

Dr. Dawn Heisey-Grove

Dawn is a federal public health leader on the U.S. Federal Civilian team. She has spent her career finding new ways to use existing or new data to modernize public health surveillance and research. With a background in public health and informatics, Dr. Heisey-Grove leads innovation and modernization in public health agencies, supporting infectious and chronic disease, bioinformatics, environmental health, and more.

Gargi Singh Chhatwal

Gargi Singh Chhatwal

Gargi is a senior solutions architect with AWS supporting worldwide public sector federal civilian customers. He has expertise in high performance computing and AI/ML. Gargi has eight years of experience helping public sector customers leverage AWS technology to design, build, and scale solutions that enable scientific research advancements.

Jon Lemon

Jon Lemon

Jon is a senior customer solutions manager at AWS supporting the National Institutes of Health. He has over 20 years of experience in the field with expertise in implementing advanced analytics, machine learning, and artificial intelligence solutions that enable federal government organizations to leverage data for more efficient, timely, and cost-effective decision-making.