SDK Part 2: Simple and testable

I love to understand how things work. I especially love when I peel back the curtain and have that “aha” moment. When the core ideas become clear and the implementation lets them shine—that’s delightful.

Client sessions in SDKs are like plumbing for most engineers—boring but essential. You just want to know it’s there when you need it, that the pipes go where they should, and that it’s reliable. You shouldn’t have to invest much energy in understanding what’s going on.

Background on sessions

MCP is about enabling LLMs with tools and resources. Those resources live on a server. The client session is the low-level worker responsible for exchanging messages with the server and deciding what to do with different message types. It forwards requests from the LLM, parses responses, logs notifications, and so on.

I peeled back the curtain on the official SDK and found

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


class BaseSession(
    Generic[
        SendRequestT,
        SendNotificationT,
        SendResultT,
        ReceiveRequestT,
        ReceiveNotificationT,
    ],
):
    _response_streams: dict[
        RequestId, MemoryObjectSendStream[JSONRPCResponse | JSONRPCError]
    ]
    _request_id: int
    _in_flight: dict[RequestId, RequestResponder[ReceiveRequestT, SendResultT]]
    _progress_callbacks: dict[RequestId, ProgressFnT]

I recoiled. This is their base session that the client session inherits from. A bunch of generics, type variables, a RequestResponder context manager, and memory streams. A hairball.

I know it works. Thousands of people rely on it. And I know the maintainers are smart. All that said, BaseSession is convoluted. It’s a burden to think about and it’s distracting from higher level work.

I wanted to build something simpler.

A Simpler Session

Our insight was

Separate how messages get from A to B (transport) from what to do with messages (session logic)

This separation means we

Have a clean conceptual model—the implementation lets the protocol shine.
Can test our session code extensively (60 tests vs ~10 in the official SDK)

Diving into the code

Our session looks like :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


class ClientSession:
    def __init__(
        self,
        transport: Transport, # Inject a transport
        client_info: Implementation,
        capabilities: ClientCapabilities,
        create_message_handler: Callable[
            [CreateMessageRequest], Awaitable[CreateMessageResult]
        ]
        | None = None, # How to handle sampling requests
        roots: list[Root] | None = None,
    ):

We listen for messages from the transport with

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


async def _message_loop(self) -> None:
    """Process incoming messages until the session stops."""
    try:
        async for message in self.transport.messages(): # Get messages from the transport
            if not self._running:
                break
            try:
                await self._handle_message(message) # Deal with the message
            except Exception:
                print("Bad message")
                continue

    except:
        print("Log failure")
    finally: # Clean up
        self._running = False
        self._cancel_pending_requests("Reason for cancelling")

and handle the session logic with

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


    async def _handle_message(self, payload: dict[str, Any]) -> None:
        """Route incoming messages to the appropriate handler."""
        try:
            if self._is_valid_response(payload):
                await self._handle_response(payload)
            elif self._is_valid_request(payload):
                asyncio.create_task(
                    self._handle_request(payload),
                    name=f"handle_request_{payload.get('id', 'unknown')}",
                )
            elif self._is_valid_notification(payload):
                await self._handle_notification(payload)
            else:
                raise ValueError(f"Unknown message type: {payload}")
        except Exception as e:
            print("Error handling message", e)
            raise

Compare the basic setup

1
2
3
4
5


session = ClientSession(transport, client_info, capabilities)
await session.initialize()

tools = await session.send_request(ListToolsRequest())
print(f"Found {len(tools.tools)} tools")

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


# Complex setup with generics and streams
async with ClientSession[
    ClientRequest, 
    ClientNotification, 
    ServerResult,
    ServerRequest, 
    ServerNotification
](read_stream, write_stream, ...) as session:
    
    result = await session.send_request(
        request=ListToolsRequest(),
        result_type=ListToolsResult,  # Have to specify this separately
        request_read_timeout_seconds=timedelta(seconds=30)
    )
    print(f"Found {len(result.tools)} tools")

The high level APIs look similar, but take a step below the surface and you feel a big difference.

The Testing Win

We can argue about aesthetics. But tests tell a clearer story. Separating transport from session means it’s easy to write a fake transport and test all failure modes we can think of. So far we have 60 tests of the client session. The official SDK has about 10. We tested the session lifecycle, initialization flow, request-response matching, message handlers, transport failures, and so on.

For example, with a little mocking, we can do things like

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


async def test_request_timeout_raises_timeout_error_and_sends_cancellation(self):
    request = PingRequest()

    with pytest.raises(TimeoutError, match="Request test-id-1 timed out after 0.05s"):
        await self.session.send_request(request, timeout=0.05)

    # Verify what actually happened
    await self.wait_for_sent_message("ping")
    await self.wait_for_sent_message("notifications/cancelled")
    assert len(self.transport.client_sent_messages) == 2

    # Make more asserts about the cancellation message...

I haven’t found a similar test in the official SDK suite. The closest I could find is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


@pytest.mark.anyio
async def test_request_cancellation():
    # 20 lines: Create server with decorators and nonlocal variables
    def make_server() -> Server:
        # Complex server setup...
    
    # 15 lines: Async task coordination with events  
    async def make_request(client_session):
        # Event coordination logic...
    
    # 20 lines: Nested context managers and task groups
    async with create_connected_server_and_client_session(make_server()) as client:
        async with anyio.create_task_group() as tg:
            # Complex async coordination...
            
    # Assertion buried in exception handler somewhere

The test is a bear to set up and follow. 80 lines to verify something about cancelled requests throwing errors. It’s no surprise the official SDK doesn’t have as many tests.

The insight is

Clean architecture enables thorough testing

As I flesh out the SDK, I could realize I’m wrong about the architecture, but I feel good knowing I can test exactly where my current ideas go wrong. I hope users will feel the same way.

Wrapping up

Going forward I want to keep the same focus on simplicity. My bet is that the returns to clarity will compound. We see hints of it in how easy it is to write tests. Those tests reveal what’s broken or where usage doesn’t feel right. That leads to further clarity and a better design. A virtuous cycle.

We’ve got to build the server session (nearly a mirror image), the transports, and then the high level APIs. Daunting—but the path is clear.

MCP has a lot of simple, powerful ideas built in to it. A beautiful SDK can help us find out if MCP encapsulates the right ideas for an AI driven future. Let’s build one and see!

Background on sessions#

A Simpler Session#

Diving into the code#

The Testing Win#

Wrapping up#

Background on sessions

A Simpler Session

Diving into the code

The Testing Win

Wrapping up