The problem
Your AI assistant burns through multiple attempts on simple requests like “let’s discuss this PR.” First it tries to fetch the PR but forgets GitHub’s query syntax. Then it asks permission to retry. It gets the PR but needs another request for review comments—and fails again because it mixed up the API parameters. You’re watching it stumble through what should be one smooth conversation.
The insight
MCP currently follows a “smart host, dumb servers” pattern—one AI learns every API across all its server connections. Instead, flip to “orchestrating host, smart servers.” Put specialist agents inside MCP servers and let your host delegate through natural language.
(See discussion that sparked idea)
An example
Current approach:
|
|
With server-side agents:
- Host: “Let’s discuss PR #42 in the Python SDK”
- GitHub agent: “Got the PR and 8 review comments. The main discussion is about error handling. Want a summary?”
- Host: “Yes, focus on the unresolved threads”
Why this matters
MCP’s sampling capability enables real dialogue between server agents and the host LLM. Your AI maintains a simple rolodex of specialists instead of memorizing hundreds of API signatures. Each specialist handles complexity internally—rate limits, parameter validation, clarification requests.
Instead of perfect API calls, you get natural delegation.