Streamlining naturalization applications with Amazon Bedrock

Public sector organizations worldwide face a common challenge: processing an ever-growing volume of document-heavy applications across various services. From naturalization procedures to asylum applications and university admissions, many crucial processes still rely on manual or partially manual methods, leading to significant backlogs, extended processing times, and increased costs.

This post explores how Amazon Bedrock can be used to address these challenges, focusing on streamlining naturalization applications. While we focus on naturalization as our primary example, the solution discussed can be applied to any public sector use case involving large-scale document processing.

Figure 1. High-level overview of the process described in this post.

Take naturalization applications as an example. These typically require multiple documents to verify an applicant’s eligibility, including proof of identity and residency and tax documents. The impact of inefficient processing is evident globally:

In 2023, Ireland received 20,650 naturalization applications while grappling with a backlog of 15,000 applications from previous years, resulting in an average processing time of 19 months.
In the United States, the US Citizenship and Immigration Services (USCIS) received approximately 781,000 naturalization applications in fiscal year (FY) 2022 and completed nearly 1,076,000 applications—a 20 percent increase from FY 2021 and the highest in nearly 15 years.
The UK faced similar pressures, with 210,465 citizenship applications in the year ending June 2023.

These challenges aren’t unique to naturalization. Asylum applications often involve complex documentation from various sources and university admissions require processing transcripts, recommendation letters, and other supporting materials from a large pool of applicants.

Many agencies still rely on outdated methods to process these applications:

Manual review – Human agents physically examining each document
Basic digital tools – Simple document management systems with limited automation
Siloed information – Lack of integration between different stages of the application process

These limitations result in systems struggling to keep pace with demand, frustrating applicants and creating inefficiencies for government agencies across various services.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon using a single API. It provides a broad set of capabilities needed to build generative AI applications with security, privacy, and responsible AI. For this solution, we cover the technical implementation using Anthropic’s Claude 3.5 Sonnet large language model (LLM) on Amazon Bedrock.

For naturalization applications, LLMs offer key advantages. They enable rapid document classification and information extraction, which means easier application filing for the applicant and more efficient application reviewing for the immigration officer. LLMs also promote consistency in application evaluation, removing potential bias, and provide scalability to handle large volumes of applications, even with surges in numbers like those seen during the COVID-19 pandemic. By using LLMs through Amazon Bedrock, government agencies can significantly reduce processing times, improve accuracy in application assessment, and allocate human resources more effectively, transforming their document processing workflows.

Solution process flow

To demonstrate this solution, we use Ireland’s naturalization process as an example and walk you through the step-by-step flow. The simplified criteria for this process include the following requirements for naturalization:

A valid passport from the applicant’s home country as proof of identity.
Proof of residency for three years, evaluated using a scoring system. Each applicant must score 150 points for each of the three years by providing one document from each of the following categories:

- Type A: Examples include Employment Detail Summaries or Department of Social Protection Annual Contributions, which grant 100 points.
- Type B: Examples include utility bills, which grant 50 points.

The following demo highlights the solution in action, providing an end-to-end walkthrough of how naturalization applications are processed. It demonstrates the entire workflow, including document upload, information extraction, residency proof scoring, summary generation, and the immigration officer’s review.

The process follows these steps:

1.The applicant uploads all required documents without needing to fill in any fields. In this step we use a LLM for classification and data extraction from the documents. This helps the applicant save time because they only upload documents instead of filling out long forms. The following screenshot shows the Upload documents page of the developed demo.

Figure 2. Screen showing documents have been uploaded.

2. The LLM processes each document, extracting necessary information based on the prompt, and provides a summary of the processed documents. The summary section gives an immediate overview to the applicant based on the documents they provided. This allows the applicant to add any missing documents to their application without waiting for the officer’s review. This summary offers one of two possibilities:

a. Confirm all documents are present and complete, as shown in the following screenshot.

Figure 3. Documents are present and complete.

b. Identify any missing documents, as shown in the following screenshot.

Figure 4. The application is missing some documents.

3. The immigration officer reviews the application, where they will be presented with the applicant details, a list of the documents provided by the applicant, and a recommendation on the application status, which could be one of two possibilities:

a. Complete, as shown in the following screenshot.

Figure 5. The application is complete.

b. Missing some information, as shown in the following screenshot.

Figure 6. The application is missing some information.

Solution walkthrough

This section provides a detailed walkthrough of the solution and its two primary applications of Anthropic’s Claude 3.5 Sonnet LLM: document processing for data extraction and summarization of the extracted information.

The solution leverages the multimodal capabilities of Claude 3.5 Sonnet alongside prompt engineering techniques to refine outputs and meet specific requirements with precision.

Techniques such as few-shot prompting—where relevant context is provided through examples—and Chain-of-Thought Prompting, which guides the model to reason step-by-step to generate more thoughtful and accurate responses, play a critical role in enhancing the reliability of results. These strategies are integral to both the extraction and summarization processes, ensuring the solution consistently delivers high-quality outputs tailored to the task at hand.

Figure 7 illustrates the architectural design of the solution.

Figure 7. Architectural diagram of the solution described in this post. The major components are an Amazon Simple Storage Service (Amazon S3) bucket, Amazon CloudFront, Amazon Cognito, AWS Lambda, Amazon DynamoDB, AWS AppSync, and Amazon Simple Queue Service (Amazon SQS) .

Steps 1 to 4:

a. The applicant authenticates to the immigration portal using Amazon Cognito. After successful authentication, the applicant is provided with a pre-signed URL to allow them to upload documents securely to the Applicant Documents Amazon Simple Storage Service (Amazon S3) bucket.

b. The file upload process invokes the Process Documents AWS Lambda function.

Steps 5 to 6:

a. The Process Documents Lambda function invokes Amazon Bedrock, passing the document to the Claude model. Using a carefully crafted prompt, Anthropic’s Claude extracts the necessary information from the documents. The extracted data is then saved to an Amazon DynamoDB table.

prompt = f'''As an expert document analyst, extract and structure the following information from the given document into JSON format. 
                    return 'NA' for any unavailable or uncertain fields.
                    Key points:
                    1. All documents must include:
                    - Document name: {document_name}
                    - Application reference: {application_ref}
                    - State (always "done")

                    2. For passports, extract:
                    - Document_Type (always "passport")
                    - Forname
                    - Surname
                    - Country_birth
                    - Date of Birth (DOB)
                    - Nationality
                    - Passport_ID
                    - Passport_expire
                    - If there is an abbreviation of the country, mention it in full. If there is a city mentioned next to the country, just return the country.
                    -  If any value has English and non-English data, use only the English value. Do not translate.
                    - use example mentioned between <passport_example></passport_example> tags as reference and use the exact field namings
                    3. For all other documents:
                        - extract:
                            - Forname
                            - Surname
                            - Address
                            - Postal code (format: one or two letters followed by a number, then a space, two letters followed by a number)
                            - Extract the postal code separately and remove it from the address line.
                            - Document_Type which could be a utility_bill or employment_summary or Department_Social_Protection
                            - Year of the document
                        - include Document category:
                            - Use "A" for Employment Detail Summary or Department of Social Protection documents
                            - Use "B" for other documents
                        - use example mentioned betwen <document_example></document_example> tags as reference and use the exact field namings

                    

                    <passport_example>
                    {{
                        "application_ref": "me-200-ctz-11-11-2011",
                        "document_name": "document.pdf",
                        "state": "done",
                        "Document_Type": "passport",
                        "Forname": "Orange",
                        "Surname": "Fruit",
                        "Country_birth": "FruitLand",
                        "DOB": "30-Dec-1985",
                        "Nationality": "Fruit",
                        "Passport_ID": "Fruit0000",
                        "Passport_expire": "13-Jan-2029"
                    }}
                    </passport_example>
                    <document_example>
                    {{
                        "application_ref": "me-200-ctz-11-11-2011",
                        "document_name": "document.pdf",
                        "state": "done",
                        "Year": "2019",
                        "Document_category": "A",
                        "Document_Type": "utility_bill",
                        "phone": "0833333333",
                        "address": "7 Fruit Land",
                        "postal_code": "F00 FF100",
                    }}
                    </document_example>

                    Provide only the JSON output, no additional text.'''

Steps 7 to 8:

a. A GraphQL mutation triggers an AWS AppSync request, which invokes the Initiate Application Brief Lambda function.

b. This Lambda function processes the event, retrieves the user’s application Id, and sends two messages to Amazon Simple Queue Service (Amazon SQS)

Steps 9 to 11:

a. The Generate Applicant Brief Lambda function retrieves the message from the SQS queue.

b. This Lambda function then queries the Amazon DynamoDB table to retrieve the information associated with the application reference extracted from the SQS message.

import json
import os
import boto3
from boto3.dynamodb.conditions import Key
from decimal import Decimal
import time

# Initialize DynamoDB and Bedrock clients

# Bedrock runtime client

def lambda_handler(event, context):
    print(event)

    record = event["Records"][0]

    queue_url = record['eventSourceARN'].split(':')[5]  # Extract queue name from ARN
    receipt_handle = record['receiptHandle']

    sqs.change_message_visibility(
        QueueUrl=queue_url,
        ReceiptHandle=receipt_handle,
        VisibilityTimeout=60  # Adjust as needed
    )

    applicationId = record["body"].split("/")
    # Extract the 'application_ref' from the incoming event
    application_ref = applicationId[1]
    user_id = applicationId[0]

    
    if not application_ref:
        print('Missing application_ref in request')
        return {
            'statusCode': 400,
            'body': json.dumps('Missing application_ref in request')
        }
    

    # Query DynamoDB for all entries with the provided application_ref (partition key)
    response = applicants_table.query(
        KeyConditionExpression=Key('user_id').eq(user_id) & Key('application').begins_with(application_ref)
    )
    
    items = response.get('Items', [])
    
    # Handle pagination if necessary
    while 'LastEvaluatedKey' in response:
        response = applicants_table.query(
            KeyConditionExpression=Key('user_id').eq(user_id) & Key('application').begins_with(application_ref),
            ExclusiveStartKey=response['LastEvaluatedKey']
        )
        items.extend(response.get('Items', []))
    
    if not items:
        print(f'No entries found for application_ref: {application_ref}')
        return {
            'statusCode': 404,
            'body': json.dumps(f'No entries found for application_ref: {application_ref}')
        }

    # Concatenate all entries into a single JSON
    concatenated_json = {}
    for item in items:
        if "document_name" in item and (item["document_name"] == "case_summary" or item["document_name"] == "officer_recommendation"):
            continue
        concatenated_json[item['document_name']] = item

    # Convert concatenated JSON to string to pass to Claude

    
    # Invoke Claude with the concatenated JSON

    # Extract specific elements to save to DynamoDB
    summary_data = {
        "user_id": user_id,
        "application": f"{application_ref}#case_summary",
        "application_ref": application_ref,
        "document_name": "case_summary",
        "ts": int(time.time()),
        "state": "processed",
        "year1": extracted_data.get("year1", {"year": "", "note": ""}),
        "year2": extracted_data.get("year2", {"year": "", "note": ""}),
        "year3": extracted_data.get("year3", {"year": "", "note": ""}),
        "case_summary": extracted_data.get("case_summary", ""),
        "application_note": extracted_data.get("application_note", "")
    }

    # Save the extracted data to DynamoDB

    try:
        sqs.delete_message(
            QueueUrl=queue_url,
            ReceiptHandle=receipt_handle
        )
        print(f"Message deleted successfully: {receipt_handle}")
    except Exception as e:
        print(f"Error deleting message: {str(e)}")

    return {
        'statusCode': 200,
        'body': json.dumps(summary_data, cls=DecimalEncoder)
    }

c. The Lambda function then passes the applicant information to the LLM in Amazon Bedrock. Using a carefully crafted prompt, the LLM evaluates the documents presented by the applicant and provides a summary.

prompt = f'''
    Analyze the provided JSON data and extract specific information as follows:

    1. From the document with Document_Type "passport" (or any other document if passport is not present):
        - First name (Forname)
        - Last name (Surname)
        - Date of birth (DOB)
        - Passport expiry date (Passport_expire)
        - Country of birth (Country_birth)
        - Passport number (Passport_ID)
        - Nationality
        Extract English values only and don't mention any non-English values.

    2. From the most recent document with Document_category "A":
        - Address

    3. Add the application_ref: {application_ref}

    Then, perform the following checks and analyses:
    1. Verify that the passport is present and not expired. Add notes about this in the Passport_document section.
    2. Check that there are documents for three distinct years.
    3. Ensure that for each year, there is exactly one Document_category "A" and one Document_category "B".
    4. For each document, note the first and last names and make sure they match the passport names. If a document has different names, please take note.
    5. For each year, add a note section describing the state of the year (e.g., "For year 2020, you have provided one type A document of type employment detail summary and one document of type B of utility bill" or "You are missing one type B document for that year").
    6. Add an application_note section summarizing the notes from the three years. If documents are missing, mention "Based on the document notes mentioned, an immigration officer might reach out for further information". If all needed documents are provided for each of the three years and have 150 score for each year, mention "All needed documents are provided, you will receive an update with next steps in due time".
    7. Add a detailed case_summary section about your findings, including different names found, missing documents, and any inconsistent data.
    8. Add a processed field with the value "Yes".

    Here's the JSON data to analyze: {concatenated_json_str}

    Please provide your analysis and results in a structured JSON format using this example for reference:

    {{
        "application_ref": "455-23cr",
        "processed": "Yes",
        "First_name": "",
        "Last_name": "",
        "DOB": "",
        "Passport_expire": "",
        "Country_birth": "",
        "Passport_ID": "",
        "Passport_document": {{
        "notes": ""
        }},
        "Nationality": "",
        "Address": "",
        "year1": {{
        "year": "",
        "note": ""
        }},
        "year2": {{
        "year": "",
        "note": ""
        }},
        "year3": {{
        "year": "",
        "note": ""
        }},
        "application_note": "",
        "case_summary": ""
    }}

    Provide only the JSON output, no additional text.
    '''

d. The case summary is then saved to Amazon DynamoDB along with the applicant information.

Stepd 12 to 14:

These steps focus on the immigration officer’s side of the process:

a. The Generate officer Brief Lambda function retrieves a message from the SQS queue.

b. This function then queries Amazon DynamoDB to retrieve the applicant’s information based on the reference extracted from the SQS message.

c. The Lambda function passes the applicant information to the LLM in Amazon Bedrock. Using a specialized prompt, the LLM evaluates the applicant’s eligibility for naturalization and provides a recommendation.

prompt = f''' You are an expert citizenship application analyst. Your task is to evaluate the following applicant data against our neutralization rules and provide a recommendation. The data has been extracted from DynamoDB and is presented in JSON format.

    Applicant Data:
    {combined_json_str}

    Please analyze this data against the following rules:

    1. Passport Requirement:
        - The applicant must provide a valid, non-expired passport.

    2. Residency Proof and Scoring:
        - The application checks residency based on 3 years.
        - For each year, the applicant should provide:
            a) One Type A document (scored 100 points)
            b) One Type B document (scored 50 points)
        - The total score for each year must be 150 points.
        - If multiple documents of the same type are provided for a year, only one will be scored.

    3. Eligibility Criteria:
        - The applicant is eligible for neutralization if they score 150 points for each of the three years.

    4. Name Matching:
        - For each document, verify that the names provided match the passport name.
        - The first and last names must match exactly.
        - If there's an extra middle name or minor discrepancy, make a note but do not disqualify the application.

    Based on your analysis, please provide:

    1. An evaluation of whether the applicant meets each rule
    2. A list of any missing or incomplete requirements
    3. A recommendation on whether to approve, reject, or request more information
    4. If more information is needed, specify what documents or proofs are required
    5. If the application should be rejected, explain why
    6. Any notes on name discrepancies, if applicable

    Please format your response as follows:

    Passport Evaluation:
    [Evaluate if a valid, non-expired passport is provided]

    Residency Proof and Scoring:
    Year 1: [Evaluate documents and score]
    Year 2: [Evaluate documents and score]
    Year 3: [Evaluate documents and score]

    Name Matching:
    [Evaluate name matching across documents and the customer provided name if it exists, note the customer provided name might be NA or empty if the customer did add any so ignore it in this case. note any discrepancies, ]

    Missing Requirements:
    [List any missing or incomplete requirements]

    Recommendation:
    [Your recommendation: Approve, Reject, or Request More Information]

    Additional Information Needed (if applicable):
    [Specify any additional documents or proofs required]

    Reason for Rejection (if applicable):
    [Explain why the application should be rejected]

    Notes:
    [Any additional notes, including minor name discrepancies]

    Please ensure your analysis is thorough and considers all aspects of the neutralization rules.'''

d. The eligibility assessment and recommendation are saved back to Amazon DynamoDB, associated with the applicant’s record.

Conclusion

This use case demonstrates the transformative potential of Amazon Bedrock and Anthropic’s Claude 3.5 Sonnet in streamlining the naturalization application process. By using AI capabilities, you can:

Reduce manual document review time.
Provide instant feedback to applicants on their application status.
Scale to handle large volumes of applications without compromising accuracy.

Although this use case focuses on naturalization applications, the same principles and technologies can be applied to a wide range of document-intensive processes across the public sector.

Are you ready to transform your document processing workflows with AI? Explore Amazon Bedrock and see how it can bring efficiency and intelligence to your public sector operations. Contact your AWS account team to discuss how this solution can be tailored to your specific needs.

AWS Public Sector Blog

Streamlining naturalization applications with Amazon Bedrock

Solution process flow

Solution walkthrough

Conclusion

Resources

Follow

Learn

Resources

Developers

Help