AWS Open Source Blog
Using Strands Agents with Claude 4 Interleaved Thinking
When we introduced the Strands Agents SDK, our goal was to make agentic development simple and flexible by embracing a model-driven approach. Today, we’re excited to highlight how you can use Claude 4’s interleaved thinking beta feature with Strands to further simplify how you write AI agents to solve complex tasks with tools. With a model-driven approach, developers no longer need to define a rigid workflow to call tools and parse model responses at each step to complete a task. With Strands Agents, you equip a model with tools and a prompt, letting it plan, chain thoughts, call tools, and reflect. Strands manages an event loop around model calls until it considers the task completed, returning a response to the client. Let’s consider how it works with this simple example (assuming you have completed the quickstart):
Strands has everything it needs to provide an answer, equipped with Claude 4 Sonnet, its prompt, and tools to generate Python code and make HTTP requests. Here’s how the event loop works.
First, Strands structures your prompt and any previous conversation history into a format the language model (like Claude) can understand. Then, Strands automatically loads available tools—these can be MCP Server tools or custom Python functions decorated with @tool. Your Python docstrings become tool descriptions, and type hints define the parameter schemas. In this example, we use two built-in tools from the strands-agents-tools package. The SDK manages errors (like rate limiting or context overflows), performs retries when needed, and emits detailed traces and metrics for observability.
With this setup complete, Strands manages an event loop that invokes the model, handles tool calling, and manages the conversation until the model provides a final answer.
- Model invocation and reasoning: The event loop calls the language model with the current conversation state, prompt, and tools. The model streams its responses, including step-by-step reasoning that you can observe as it “thinks out loud.”
- Tool use detection and execution: If the model decides it needs to call a tool (to fetch data, perform a calculation, etc.), the event loop detects this request, validates it, and executes the corresponding Python function or MCP Server tool with the parameters provided by the model.
- Context update: The result of the tool execution is appended to the ongoing conversation, allowing the model to incorporate the new information into its next iteration.
You will see this loop in action when running the ISS example with Python. The model makes a plan to use its tools, including API calls for real-time data about the ISS and using its Python REPL (Read evaluate print loop) to calculate distances and vectors. The agent will respond with an answer to the question, including a statement like “The ISS is currently positioned over the western Pacific Ocean, making the western North American cities (Vancouver, Seattle, Portland) much closer than New York. Vancouver’s slightly more northern latitude gives it the advantage over Seattle and Portland.”
The model makes a plan to use its tools, involving API calls for real-time data about the ISS and using its Python REPL to calculate distances and vectors. It then generates an answer and shows its work. The following is example output from this agent running locally.
Supercharging the Strands event loop with Claude 4’s interleaved thinking
Claude 4 introduces a beta feature called “interleaved thinking” which fits perfectly with Strands’ model-driven approach. This enables Claude to reflect after a tool call and adjust its plan dynamically without needing to complete the current event loop iteration. Interleaved thinking expands on the model’s ability to self-reflect, correct errors, and orchestrate a workflow of reasoning and tool use.
If you’re using Amazon Bedrock as your Strands model provider, you can turn on interleaved thinking by specifying the additional request fields parameter to Bedrock:
If you enable tracing with Strands, you’ll see additional blocks of “reasoningContent” in your trace, like in the following image, including reasoning when Claude 4 decides to interleave thinking after tool calls.
For example, in this response there is an erroneous calculation from a tool call, which Claude could identify and fix immediately, before continuing to the next iteration of the event loop.
Compared to the traditional ReAct method, the interleaved thinking approach is faster and more fluid. You can think of ReAct like a detective taking notes and making deductions step by step, whereas interleaved thinking is more like a domain expert mentally juggling facts while explaining a concept. In other words, thought and action happen in one thinking block, not another complete loop.
Here’s another example where Claude was able to reduce the number of tool calls by noticing it could calculate the answer from the information retrieved in its first API call, which otherwise would have only been determined in a second iteration of the event loop.
These examples only scratch the surface of what you can build with Strands and Claude 4 using interleaved thinking. We’ve published additional agent samples tackling more complex problems and equipped with numerous tools, like this one demonstrating interleaved thinking. Using these examples, you’ll see more dynamic reasoning from Claude, like learning from a failed tool call and retrying with refined parameters, or coming up with new strategies on the fly instead of looping the same strategy across multiple tool calls.
We built Strands Agents to simplify agent development by embracing models like Claude 4 that do a great job of breaking down problems into tool workflows that achieve results. We can’t wait to see what you build with Strands. Join the discussion at https://github.com/strands-agents/sdk-python
Jawhny Cooke, Vadim Omeltchenko, and Mark Roy contributed to this post.