Using Strands Agents to create a multi-agent solution with Meta’s Llama 4 and Amazon Bedrock

Multi-agent solutions, in which networks of agents collaborate, coordinate, and reason together, are changing how we approach real-world challenges. Enterprises manage environments with multiple data sources, changing goals, and various constraints. This is where multi-agent architectures shine. By empowering multiple agents that each have specialized tools, memory, or perspectives to interact and reason as a collective, organizations unlock powerful new capabilities:

Scalability – Multi-agent frameworks handle tasks of growing complexity, distributing workload intelligently and adapting to scale in real time.
Resilience – When agents work together, failure in one can be compensated or mitigated by others, creating robust, fault-tolerant systems.
Specialization – Individual agents excel in specific domains (such as finance, data transformation, and user support) yet can seamlessly cooperate to solve cross-disciplinary problems.
Dynamic problem solving – Multi-agent systems can rapidly reconfigure, pivot, and respond to change, which is essential in volatile business, security, and operations environments.

Recent launches in agentic AI frameworks, such as Strands Agents, make it easier for developers to participate in the creation and deployment of model-driven, multi-agent solutions. You can define prompts and integrate toolsets, allowing robust language models to reason, plan, and invoke tools autonomously rather than relying on handcrafted, brittle workflows.

In production, services such as Amazon Bedrock AgentCore support secure, scalable deployment with features like persistent memory, identity integration, and enterprise-grade observability. This shift towards collaborative, multi-agent AI solutions is revolutionizing software architectures by making them more autonomous, resilient, and adaptable. From real-time troubleshooting in cloud infrastructure to cross-team automation in financial services and chat-based assistants coordinating complex multistep business processes, organizations adopting multi-agent solutions are positioning themselves for greater agility and innovation. Now, with open frameworks such as Strands, anyone can start building intelligent systems that think, interact, and evolve together.

In this post, we explore how to build a multi-agent video processing workflow using Strands Agents, Meta’s Llama 4 models, and Amazon Bedrock to automatically analyze and understand video content through specialized AI agents working in coordination. To showcase the solution, we will use Amazon SageMaker AI to walk you through the code.

Meta’s Llama 4: Unlocking the value of 1M+ context windows

Llama 4 is Meta’s latest family of large language models (LLMs) that stands out for its context window capabilities and multimodal intelligence. Both models use mixture-of-experts (MoE) architecture for efficiency, are designed for multimodal inputs, and are optimized to power agentic systems and complex workflows. The flagship variant, Meta’s Llama 4 Scout, supports a 10 million token context window—an industry-first—enabling the model to process and reason over large amounts of data in a single prompt.

This supports applications such as summarizing entire libraries of books, analyzing massive codebases, conducting holistic research across thousands of documents, and maintaining deep, persistent conversation context across long interactions. The Llama 4 Maverick variant also offers a 1 million token window, making it suitable for demanding language, vision, and cross-document tasks. These ultralong context windows open new possibilities for advanced summarization, memory retention, and complex, multistep workflows, positioning Meta’s Llama 4 as a versatile solution for both research and enterprise-grade AI applications

Model name	Context window	Key capabilities and use cases
Meta’s Llama 4 Scout	10M tokens (up to 3.5M using Amazon Bedrock)	Ultralong document processing, entire book or codebase ingestion, large-scale summarization, extensive dialogue memory, advanced research
Meta’s Llama 4 Maverick	1M tokens	Large context multimodal tasks, advanced document and image understanding, code analysis, comprehensive Q&A, robust summarization

Solution overview

This post demonstrates how to build a multi-agent video processing workflow by using the Strands Agents SDK, Meta’s Llama 4 with its multimodal capabilities and context window, and the scalable infrastructure of Amazon Bedrock. Although this post focuses primarily on building specialized agents to create this video analysis solution, the practices of creating a multi-agent workflow can be used to build your own adaptable, automated solution at the enterprise level.

For scaling, this approach extends naturally to handle larger and more diverse workloads, such as processing video streams from millions of connected devices in smart cities, industrial automation for predictive maintenance through continuous video and sensor data analysis, real-time surveillance systems across multiple locations, or media companies managing vast libraries for indexing and content retrieval. Using the Strands Agents built-in integration with Amazon Web Services (AWS) services and the managed AI infrastructure of Amazon Bedrock means that your multi-agent workflows can elastically scale, distribute tasks efficiently, and maintain high availability and fault tolerance. You can build complex, multistep workflows across heterogeneous data sources and use cases—from live video analytics to personalized media experiences—while maintaining the agility to adapt and expand as business needs evolve.

Introduction to agentic workflows using Strands Agents

This post demonstrates a video processing solution that implements an agent workflow using six specialized agents. Each agent performs a specific role, passing its output to the next agent to complete multistep tasks in the process. This is conducted through the same analysis as the deep research architecture, in which there is an orchestrator agent that coordinates the process of the other agents working together in tandem. This concept in Strands Agents is called Agents as Tools.

This architectural pattern in AI systems allows for specialized AI agents to be wrapped as callable functions (tools) that can be used by other agents. This agentic workflow has the following specialized agents:

Llama4_coordinator_agent – Has access to the other agents and kicks off the process from frame extraction agent to summary generation
s3_frame_extraction_agent – Uses OpenCV library to extract meaningful frames from videos, handling the complexity of video file operations
s3_visual_analysis_agent – Has necessary tools that process the frames by analyzing each image and storing it as a JSON file to the provided Amazon Simple Storage Service (Amazon S3) bucket
retrieve_json_agent – Retrieves the analysis on the frames in the form of a JSON file
c_temporal_analysis_agent – AI agent that specializes in temporal sequences in video frames by analyzing images chronologically
summary_generation_agent – Specializes in creating a summary of the temporal analysis of the images

Modularizing the video analysis solution with Agents as Tools

The process begins with the orchestrator agent, implemented using Meta’s Llama 4, which coordinates communication and task delegation among specialized agents. This central agent initiates and monitors each step of the video processing pipeline. Using the Agents as Tools pattern in Strands Agents, each specialized agent is wrapped as a callable function (tool), enabling seamless inter-agent communication and modular orchestration. This hierarchical delegation pattern allows the coordinator agent to dynamically invoke domain-specific agents, reflecting how collaborative human teams function.

Customizability – Each agent’s system prompt can be independently tuned for optimal performance in its specialized task.
Separation of concerns – Agents focus on what they do best, making the system more straightforward to develop and maintain.
Workflow flexibility – The coordinator agent can orchestrate components in different sequences for various use cases.
Scalability – Components can be optimized individually based on their specific performance requirements.
Extensibility – New capabilities can be added by introducing new specialized agents without disrupting existing ones.

By turning agents into tools, we create building blocks that can be combined to solve complex video understanding tasks, demonstrating how you can use Strands Agents to support multi-agent systems with specialized LLM-based reasoning. Let’s examine the coordinator_agent:

def new_llama4_coordinator_agent() -> Agent:
    """
    Factory constructor: creates a NEW agent instance with a fresh conversation history.
    Use this per video request for clean isolation.
    """
    return Agent(
        system_prompt="""You are a video processing coordinator. Your job is to process videos step by step.
##When asked to process a video:
1. Extract frames from S3 video using run_frame_extraction
2. Use the frame location from step 1 to run_visual_analysis
3. WAIT for visual analysis to complete sending the json to s3
4. Use the retrieve_json agent to extract the json from step 3
5. Use the text result of retrieve_json_from_s3 by passing it to run_temporal_reasoning
6. Pass the result from temporal reasoning to run_summary_generation
7. Upload analysis generated in run_summary_generation and return s3 location
##IMPORTANT:
- Call ONE tool at a time and wait for the result
- Use the EXACT result from the previous step as input
- Do NOT call multiple tools simultaneously
- Do NOT return raw JSON or function call syntax
""",
        model=bedrock_model,
        tools=[
            run_frame_extraction,
            run_visual_analysis,
            run_temporal_reasoning,
            run_summary_generation,
            upload_analysis_results,
            retrieve_json_from_s3,
        ],
    )

Calling the coordinator_agent triggers the agent workflow to call the s3_frame_extraction_agent. This specialized agent has the necessary tools to extract key frames from the input video using OpenCV, upload the frames to Amazon S3, and identify the folder path to pass off to the run_visual_analysis agent. The following diagram shows this flow.

After the frames are stored in Amazon S3, the visual_analysis_agent will have access to tools that list the frames from the S3 folder, use Meta’s Llama in Amazon Bedrock to process the images, and upload the analysis as a JSON file to Amazon S3.

The code below will walk you through the different key parts of the different agents. The following example shows the visual_analysis_agent:

@tool
def upload_local_json_to_s3(s3_video_path: str, local_filename: str = "visual_analysis_results.json") -> str:
    """Upload local JSON file to S3 bucket in video folder"""
    try:
        s3_parts = [part for part in s3_video_path.replace('s3://', '').split('/') if part bucket = s3_parts[0]
        video_folder = s3_parts[-1]
        
        if '_' in video_folder:
            base_video_name = video_folder.split('_')[0]
        else:
            base_video_name = video_folder
        random_num = randint(1000, 9999)
        
        s3_key = f"videos/{base_video_name}/{random_num}_{local_filename}"
        
        s3_client = boto3.client('s3')
		s3_client.upload_file(local_filename, bucket, s3_key)
        
        return f"s3://{bucket}/{s3_key}"
    except Exception as e:
        return f"Error uploading file: {str(e)}"

s_visual_analysis_agent = Agent(
    system_prompt="""You are an image analysis agent that processes frames from S3 buckets.

Your workflow:
1. Use the available tools to analyze images
2. Use the video path folder to place the analysis results

IMPORTANT:
- Do NOT generate, write, or return any code
- Focus on describing what you see in the images
- Images are automatically resized if too large
- Put numbered labels in front of each image description (e.g., "1. ", "2. ", etc.)
- Always save analysis results locally first, then upload to S3

Return Format:
The uri from the upload_local_json_to_s3 tool""",
    model=bedrock_model,
    callback_handler=None,

    tools=[list_s3_frames, analyze_image, analyze_all_frames, analyze_frames_batch, upload_local_json_to_s3],
)

After uploading the JSON to Amazon S3, there is a specialized agent that retrieves the JSON file from Amazon S3 and analyzes the text:

@tool
def process_s3_analysis_json(s3_uri: str) -> str:
    """Retrieve JSON from S3 and extract only the analysis text"""
    try:
        # Parse S3 URI and download JSON
        s3_parts = s3_uri.replace('s3://', '').split('/', 1)
        bucket = s3_parts[0]
        key = s3_parts[1]
       
        s3_client = boto3.client('s3')
        response = s3_client.get_object(Bucket=bucket, Key=key)
        json_content = response['Body'].read().decode('utf-8')
        
        # Parse and extract text
        data = json.loads(json_content)
        
        # Handle both formats
        if 'analyses' in data:
            analyses = data['analyses']
        elif 'sessions' in data:
            analyses = [session['data'] for session in data['sessions'] if 'data' in session]
		else:
            return "Error: No 'analyses' or 'sessions' field found"
        
        # Extract text only
        text_only = []
        for analysis in analyses:
            if 'analysis' in analysis:
                text = analysis['analysis']
                if not text.startswith("Failed:"):
                    text_only.append(text)
        
        # Clean up local file
        local_file = "visual_analysis_results.json"
		if os.path.exists(local_file):
            os.remove(local_file)
        
        return "\n".join(text_only)
    except Exception as e:
        return f"Error processing {s3_uri}: {str(e)}"


bedrock_model = BedrockModel(
    model_id='us.meta.llama4-maverick-17b-instruct-v1:0',
    region_name=region,
    streaming=False,
    temperature=0
)  

retrieve_json_agent = Agent(
system_prompt="Call process_s3_analysis_json with the S3 URI. Your response must be the exact text output from the tool, nothing else.",
    model=bedrock_model,
    callback_handler=None,

    tools=[process_s3_analysis_json],
)

This output will then be fed to

the temporal_analysis_agent to gain temporal awareness of the sequences in the video frames and provide a detailed description of the visual content.

After the temporal analysis output has been generated, the summary_generation_agent will be kicked off to provide the final summary.

Prerequisite and Setup Steps

To run the solution on either the notebook or the Gradio UI, you need the following:

An AWS account with access to Amazon Bedrock.

To copy over the project,

Clone the Meta-LLama-on-AWS github repository:

git clone https://github.com/aws-samples/Meta-Llama-on-AWS.git
cd agents/strands/Bedrock/multi-agent-video-processing/

In your terminal, install the correct dependencies:

pip install -r requirements.txt

Deploy video processing app on Gradio

To deploy the video processing app on Gradio, follow these application launch instructions:

To launch the Python terminal, open your Python3 command line interface
To install dependencies, execute pip install commands for the required libraries (refer to the preceding library installation section)
To execute the application, run the command python3 gradio_app.py
To access the interface, choose the generated hosted link displayed in the terminal
To initiate video processing, upload your video file through the interface and then choose Run

The Meta’s Llama video analysis assistant provides the following output for the video buglifeflik.mp4 provided in the GitHub repository:

Llama Video Analysis Log
Flik is shown determined in front of a tree.
He interacts with other insects.
Flik gathers items and constructs a device.
He presents the invention to a group of insects.
The group reacts withskepticism.
Flik is chased by a group of birds. 

Key visual elements:
The key visual elements include Flik’s determined expression, his interaction with other insects, the items he gathers, the complex device he constructs, the group’s skeptical reaction, and the chaotic scene of Flik being chased by birds.
Overall Narrative:
The narrative follows Flik’s journey as he prepares and presents an invention, faces rejection, and experiences a dramatic consequence. The story is character-driven, showcasing Flik’s actions and their outcomes, and builds up to a climactic event.

The following screenshot shows the Gradio UI with this output.

Running in the Jupyter Notebook

After the necessary libraries are imported, you need to manually upload your video to your S3 bucket:

def upload_to_sagemaker_bucket(local_video_path, base_folder="videos/"):
    sagemaker = boto3.client('sagemaker')
    s3 = boto3.client('s3')

    # Get default SageMaker bucket
    account_id = boto3.client('sts').get_caller_identity()['Account']
    region = boto3.Session().region_name
    bucket_name = f"sagemaker-{region}-{account_id}"
    # Get filename and create subfolder name
    filename = os.path.basename(local_video_path)
    filename_without_ext = os.path.splitext(filename)[0]
    # Create the full S3 path: videos/filename_without_ext/filename
    s3_key = os.path.join(base_folder, filename_without_ext, filename)
    # Upload file
    s3.upload_file(local_video_path, bucket_name, s3_key)  
    s3_uri = f"s3://{bucket_name}/{s3_key}"
    print(f"Uploaded to {s3_uri}")  

    s3_folder_path = os.path.join(base_folder, filename_without_ext)
    s3_folder_uri = f"s3://{bucket_name}/{s3_folder_path}"

    return s3_folder_uri

# Example usage: Provide your local video path here

s3_video_uri = upload_to_sagemaker_bucket(local_video_path)

After the video is uploaded, you can start the agent workflow by instantiating a new agent with fresh conversation history:

# Start the workflow
agent = new_llama4_coordinator_agent()
video_instruction = f"Process a video from {s3_video_uri}. Use tools in this order: run_frame_extraction, run_visual_analysis, retrieve_json_from_s3, run temporal_reasoning, run_summary_generation_ upload_analysis_results"
response = agent(video_instruction)
print(response)

Tool #1: run_frame_extraction

Tool #2: run_visual_analysis

Tool #3: retrieve_json_from_s3

Tool #4: run_temporal_reasoning

Tool #5: run_summary_generation

Tool #6: run_summary_generation
**What happens in the video:**
The video follows Flik as he navigates through a series of events, starting from being cautious in a natural setting, seeking help or communicating with other insects, participating in a crucial discussion or planning, and finally taking action with the group.

**Chronological Sequence of Events:**
The sequence begins with Flik being cautious near a tree, followed by him approaching a group of insects, then being part of a significant gathering or discussion, and concludes with Flik and the insects taking action together.

**Sequence of events:**
1. Flik is initially seen being cautious in a natural environment.
2. He then approaches a group of insects, likely to communicate or seek help.
3. A gathering of insects is shown with Flik at the center, indicating a crucial discussion or planning.
4. The final scene shows Flik and the insects in action, possibly executing a plan or facing a challenge.

**Key visual elements:**
The key visual elements include Flik's cautious initial stance, his interaction with other insects, the gathering or discussion, and the final action scene, highlighting the progression from solitude to collective action.

**Overall Narrative:**
The narrative follows Flik's journey from caution and seeking help to participating in a crucial discussion and finally to taking action with a group of insects, suggesting a story arc that involves progression, planning, and collective action.
Tool #7: upload_analysis_results
The video processing is complete. The final analysis results are saved to s3://sagemaker-us-west-2-333633606362/videos/buglifeflik/analysis_results_20250818_190012.json.The video processing is complete. The final analysis results are saved to s3://sagemaker-us-west-2-333633606362/videos/buglifeflik/analysis_results_20250818_190012.json.

Cleanup

To avoid incurring unnecessary future charges, clean up the resources you created as part of this solution:To delete the Amazon S3 files:

Open the AWS Management Console
Navigate to Amazon S3
Find and select your Amazon SageMaker bucket
Select the video files you uploaded
Choose Delete and confirm

To stop and remove the SageMaker notebook:

Go to Amazon SageMaker AI in the AWS Management Console
Choose Notebook instances
Select your notebook
Choose Stop if it’s running
After it has stopped, choose Delete

Conclusion

This post highlights how combining the Strands Agents SDK with Meta’s Llama 4 models and Amazon Bedrock infrastructure enables building advanced, multi-agent video processing workflows. By using highly specialized agents that communicate and collaborate through the Agents as Tools pattern, developers can modularize complex tasks such as frame extraction, visual analysis, temporal reasoning, and summarization. This separation of concerns enhances maintainability, customization, and scalability while allowing seamless integration across AWS services.We encourage developers to explore and extend this architecture by adding new specialized agents and adapting workflows to diverse use cases—from smart cities and industrial automation to media content management. The combination of Strands Agents, Meta’s Llama 4, and Amazon Bedrock lays a robust foundation for creating autonomous, resilient AI solutions that tackle the complexity of modern business environments.

To get started, visit the official GitHub repository for the Meta-Llama-on-AWS agents project for code examples and deployment instructions. For further insights on building with Strands Agents, explore the Strands Agents documentation, which offers a code-first approach to integrating modular AI agents. For broader context on multi-agent AI architectures and orchestration, AWS blog posts on agent interoperability and autonomous agent frameworks provide valuable guidance shaping the future of intelligent systems.

About the authors

Sebastian Bustillo is an Enterprise Solutions Architect at Amazon Web Services (AWS), working with airlines and is an active member of the AI/ML Technical Field Community. At AWS, he helps customers unlock business value through AI. Outside of work, he enjoys spending time with his family and exploring the outdoors. He’s also passionate about brewing specialty coffees.

Artificial Intelligence

Using Strands Agents to create a multi-agent solution with Meta’s Llama 4 and Amazon Bedrock

Meta’s Llama 4: Unlocking the value of 1M+ context windows

Solution overview

Introduction to agentic workflows using Strands Agents

Modularizing the video analysis solution with Agents as Tools

Prerequisite and Setup Steps

Deploy video processing app on Gradio

Running in the Jupyter Notebook

Cleanup

Conclusion

About the authors

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help