Tools are the most important part of any Agentic architecture. Agents become powerful with tools, but tools are also the reason for context rot. Approx 30 tools is the practical limit for most agents - but that’s still a huge token count just for tool descriptions.
Building agents is sophisticated task. You can’t do too much or too little - you have to be just right.
Manus has 12 tools. Claude Code has 10-12. Yet they outperform agents with 50+ tools. Why?
They asked a different question: “What if we gave the agent a computer instead of a toolbox - and let it figure out what needs to be done?”
The Sandbox Philosophy
Sandboxing is not a new term - it just exploded with the rise of AI and agentic architecture. A sandbox is an isolated virtual environment where an agent can execute LLM-generated scripts securely without affecting the codebase or local environment.
Here’s the simple mental model:
User asks: “What are the DNS records for example.com?”
Traditional agent checks its tool catalogue. No DNS tool. It apologizes and fails.
Sandbox agent thinks differently: “I don’t have a tool for this. But I have Python and a computer.”
It writes a script, installs dnspython, runs it, reads the output, returns the answer.
No new tool was added. The agent built what it needed, when it needed it.
Let’s build a simple agent
Let’s use E2B and Langgraph to build a simple example agent. Full code available in github
First we will need to tell the agent that if user query beyond the capability of the provided tool catalogue then it will create a script, execute in the sandbox environment and return the response to the user. We will tell this with system_prompt
system_prompt = """You are a helpful assistant. If you don't have tools to solve a query, generate a script and use run_code to execute it. """
And then the we have to create tool that will give agent the capability to execute LLM generated script inside a sandbox environment. Here is a simplest version of the tool
@tool
def run_code(code: str) -> str:
"""Run Python code in the sandbox."""
return sandbox.run_code(code)
That’s the core. One tool. One instruction. The agent handles the rest.
The full graph is simple: Agent reasons → generates code → executes in sandbox → reads output → responds.
Why This Works
Traditional approach: Anticipate every user request, build a tool for each. 50 tools later, bloated context, still missing edge cases.
Sandbox approach: Give the agent a computer. One tool, infinite capabilities.
The agent with fewer tools is more capable. It learned the most human skill - figuring shit out on the fly. Its acts like a power user.
This isn’t replacement for tools. Search, file read/write, db query are high frequency operations. They deserve dedicated reliable tool. But for the things that the agent never anticipated the sandbox approach is great.
Think of it as a hybrid:
- 10-15 core tools for the hot path
- 1 sandbox tool for everything
Scaling Beyond The Sandbox
What if you actually need 100+ tools?
Loading them all at agent initialization bloats your context window. But there’s another way.
Pre-write scripts for each tool and store them in the sandbox environment. Give the agent basic file system access - grep, glob, ls. Now instead of loading 100 tool descriptions, you tell the agent:
“Search the /tools directory, find the right script for the user’s query, execute it.” The agent becomes a tool discovery system. It greps for relevant scripts, reads the one it needs, runs it, returns the result.
100 tools, near-zero context cost. The tool descriptions live in the sandbox, not in your prompt.
Every tool you add is maintenance. Every tool is context bloat. Every tool you add is another thing that can break.
But here’s the thing, LLMs are really good at following instructions now. They can reason, write code, debug, retry. Relying on the LLM to figure things out isn’t a hack anymore. It’s the architecture.
If you are building agents lets talk.