Optimize agent tool selection using Amazon S3 Vectors and Amazon Bedrock Knowledge Bases

State-of-the-art AI agents rely on external tools to perform actions on their behalf. A tool is a function with a clear description, defined inputs, and outputs that extend the capabilities of a large language model (LLM). As toolkits expand, selecting the right tool for each task requires effective mechanisms, among which semantic search enables agents to understand contextual meaning and capabilities for more intelligent choices. For example, an agent attempting to “analyze customer feedback sentiment” should be able to discover relevant tools even if they are named differently, such as “emotion analyzer” or “opinion classifier.” As AI agents become more sophisticated, they may need to reason over hundreds of available tools, making semantic search essential for narrowing down relevant options before making final selection decisions.

Amazon Bedrock AgentCore Gateway is a fully managed solution that provides a unified interface where agents can discover, access, and invoke tools with semantic tool selection built in. This managed service handles the complexity of tool discovery and selection, making it the recommended approach for most production use cases. However, some customers prefer to build custom solutions or need more control over their tool selection implementation. For these custom implementations, the recommended approach is to use vector-based semantic search to efficiently filter and retrieve relevant tools before passing them to the AI agent for final selection.

In this post, we show how Amazon S3 Vectors can serve as an effective backend vector store solution for an Amazon Bedrock Knowledge Base for custom implementations, enabling vector-based semantic search to improve performance and lower costs for those building their own agentic systems. We build a full agentic system with semantic tool search capabilities and test it on public datasets. This demonstrates the use of S3 Vectors for semantic tool selection significantly outperforms the baseline approach of passing all tools in context, with improved accuracy, lower latency, and substantial cost savings, particularly for small-scale use cases with variable query patterns.

Introducing Amazon S3 Vectors

S3 Vectors is the first cloud object store with native support to store and query vectors, providing purpose-built and cost-optimized vector storage in S3. Vector search, the backbone for the semantic search methodology used pervasively throughout generative AI applications, is critical for use cases such as Retrieval Augmented Generation (RAG), text and image similarity searches, and recommendation systems.

S3 Vectors is now generally available, and it transforms vector storage and queries into a competitive advantage by delivering billion-scale vector storage at up to 90% lower costs for storing, uploading, and querying vectors than specialized vector database solutions. S3 Vectors now supports up to 2 billion vectors per index for large-scale production workloads without complex index management. With strong consistency, newly ingested vectors are immediately available for querying, enabling seamless near real-time operations without requiring index rebuilding. Furthermore, S3 Vectors delivers up to 1,000 transactions per second when streaming single-vector updates into your indexes and query latency around 100 milliseconds or less for more frequent queries. Together, these provide accelerated AI responses and interactive use cases. With a fully serverless architecture and a pay-as-you-use model, developers can use S3 Vectors to experiment and prototype quickly and economically, then scale seamlessly to production workloads.

S3 Vectors GA detail

Evaluation approach for agentic tool selection

To evaluate vector search for tool selection, we built an agentic system with real-world, tool-calling behavior. Rather than measuring retrieval accuracy, we evaluate the performance of semantic search through final tool selection to understand how the vector store performs in production scenarios. The experiment focuses on identifying the right tool and comparing it with ground truth, rather than calling the tool and getting the actual response. The following diagram shows the agentic architecture:

Flow diagram depicting a 6-step tool selection process: user submits task input to agent, agent generates search query using Amazon Bedrock LLMs, searches tool documentation in Amazon Bedrock Knowledge Bases (backed by Amazon S3 Vectors), retrieves top-k tools, evaluates options with LLMs, and outputs the selected tool.

We implemented an AI agent using LangGraph, Amazon Bedrock, and Amazon Bedrock Knowledge Bases that performs tool selection with a semantic search element to narrow down the choice of tools.

The data flows through the agentic system in the following steps:

Task input, in the form of a user question, is submitted to the agent. An example task input might be, “Is it raining in San Francisco right now?”.
The agent invokes an Amazon Bedrock LLM to convert the task into a natural language query better suited for a semantic tool search. An example query might be, “find the current weather in a specific city”.
The agent submits the query to the Amazon Bedrock Knowledge Base Retrieve API, which embeds the query and performs a similarity search against the backend vector store.
The Amazon Bedrock Knowledge Base returns the top-k most similar tools back to the agent. One such tool might look like (shortened for readability): `{ “tool_name”: “check_current_weather”, “description”: “Retrieve current weather forecast for a specific city”, “parameters”: [{“name”:“city”,”type”:”string”}] }`
The agent invokes an Amazon Bedrock LLM to choose the one tool best suited to address the user question.
The one best tool choice is output from the system.

In a full end-to-end solution, the agent would traditionally continue to call this tool and return the response to the user. However, we’re limiting our focus on tool selection for this experiment.

To help quantify and demonstrate the value of vector search, we also implemented a baseline approach where all available tools are provided directly to the LLM in a single prompt without any retrieval step, which isn’t shown in the preceding architecture.

MCPVerse benchmark dataset

We used the MCPVerse benchmark dataset, a comprehensive evaluation suite for agentic tool selection. MCPVerse provides hundreds of real-world, executable tools from Model Context Protocol (MCP) servers. The model identifies a task requiring external resources, selects the appropriate tool, runs the tool with necessary parameters, integrates the output into its response, and adapts its approach based on the tool’s results.

For example, MCPVerse contains a tool for retrieving World Bank indicators:

json
{
  "name": "get_indicator_for_country",
  "description": "Get values for an indicator for a specific country from the World Bank API",
  "parameters": {
    "type": "object",
    "properties": {
      "country_id": {
        "type": "string",
        "description": "The ID of the country for which the indicator is to be queried"
      },
      "indicator_id": {
        "type": "string",
        "description": "The ID of the indicator to be queried"
      }
    },
    "required": ["country_id", "indicator_id"]
  }
}

When asked a question such as “What is the GDP of the United States in 2023?”, the agent decides to use the tool and get the required info. The agent would run the tool with the appropriate parameters, such as “country_id” set to US to retrieve the GDP data for the United States. The tool would return the GDP value, which the agent would integrate into its response, providing the user with the requested information in a clear and concise manner.

From this collection of tools, we extracted 422 tools that could be used without requiring API keys. These tools cover diverse capabilities, from maps and weather to code execution and file operations. The documentation of each tool in JSON format was added to the vector storage using the Amazon Titan Text Embedding v2 model. The document included the name, properties, parameters, and other information required for tool selection and usage. For chunking, we use a tool-based strategy, embedding each tool’s documentation as a single chunk. For this evaluation, we focused on 62 single-tool tasks where the agent is expected to use a single tool to complete the given task.

Evaluation metrics

We measured five key metrics to evaluate system performance:

Accuracy: tracks the percentage of tasks where the agent selected the correct tool
Recall: measures the percentage of tasks where the correct tool appeared in the top 20 retrieved results
End-to-end latency: measures the total time from receiving a task to producing a tool selection, which includes the time taken to generate the query, run it to get the top 20 tools back, and have the LLM pick the top tool
Retrieval time: specifically measures query time for S3 Vectors
Cost: tracks both LLM inference costs based on token usage and costs for vector storage and queries

Evaluation results

We evaluated two approaches across 62 single-tool selection tasks from the MCPVerse benchmark:

Baseline: A traditional approach where all 422 tools, including their names, descriptions, and parameters, are provided directly in the model’s context window without any vector search or filtering
S3 Vectors: Semantic retrieval using an Amazon Bedrock Knowledge Base with a S3 Vector backend to first filter the most relevant tools before passing them to the model

All experiments used Claude Haiku 4.5 with a temperature of 0.0 to ensure more deterministic and consistent responses. We present the results using top-20 retrieval.

Vector search outperforms baseline

In contrast to the baseline approach, which considers all 422 tools for each query, vector search retrieves the most relevant tools first through semantic search, and provides only those candidates to the LLM for final selection. Vector search with top-20 retrieval achieved higher accuracy (82.3% as opposed to 75.8%) and 91.9% recall, with 21% faster end-to-end latency (4.25 s as opposed to 5.41 s). The added retrieval step takes approximately 0.41 s. However, the LLM processing 20 tools is significantly faster than processing all 422 tools, and the retrieval overhead is more than offset by reduced inference time.

This demonstrates that S3 Vectors delivers strong performance for small-scale vector search use cases such as tool selection.

A table comparing the use of S3 Vectors vs baseline approach for agent tool selection use case

LLM cost savings

Using Claude Haiku 4.5 on-demand pricing as a reference (us-east-1), we observed that vector search reduces LLM inference costs by over 92% when compared to the baseline approach ($0.015 as opposed to $0.202 per query). These savings come from dramatically reduced input token usage by only including relevant tools queried from the vector store, rather than all tools in the context. Refer to the Amazon Bedrock Pricing Page for the most up-to-date pricing.

Cost considerations for this use case

For this particular use case, a small vector storage (422 vectors) with moderate query frequency, S3 Vectors offers compelling cost advantages:

S3 Vectors pricing:

Storage: $0.06 per GB per month
PUT (initial upload): $0.005 per 1000 requests
Query API cost: $2.50 per 1 million queries
Query processing cost: $0.004/TB (first 100K vectors tier)
- Total vector DB cost for 1 million queries/month: $2.57

Refer to the S3 Vectors pricing page for detailed pricing information and the Amazon Web Services (AWS) Pricing Calculator to calculate the cost for your use case.

Conclusion

Our evaluation demonstrates that S3 Vectors provides a cost-effective vector store solution for semantic search workloads. For the agentic tool selection use case, Amazon S3 Vectors delivered strong accuracy and performance while offering significant cost advantages through its pay-per-query pricing model. To get started with S3 Vectors for your own agentic tool selection use case, visit the Amazon S3 Vectors documentation.

AWS Storage Blog

Optimize agent tool selection using Amazon S3 Vectors and Amazon Bedrock Knowledge Bases

Introducing Amazon S3 Vectors

Evaluation approach for agentic tool selection

MCPVerse benchmark dataset

Evaluation metrics

Evaluation results

Vector search outperforms baseline

LLM cost savings

Cost considerations for this use case

Conclusion

Resources

Follow

Learn

Resources

Developers

Help