Discussion on #1 - MCP Testing #1225
Replies: 17 comments 35 replies
-
|
Hi! I would love to work on the MCP testing idea. I have a few questions to help clarify the expectations and scope of the project. 1.Would the testing suite be a separate repository/module, or integrated directly into the core API Dash codebase? 2.What is the expected interface? Should it be a standalone web interface, or integrated seamlessly into the existing API Dash interface? 3.If it is a web interface, is a local standalone version fine for this idea? 4.How would client testing work in practice? Would it be a Claude/Cursor-like interface where users provide an API key to access an LLM, and then they can easily connect their created MCP server and use the API to interact with it? 5.For enabling the creation of MCP servers, should the solution be tailored strictly towards users who don't know how to code, or accommodate both developers and non-developers? 6.Should it operate similarly to the standard MCP Inspector, or actively extend it? 7.Alternatively, do we want a completely standalone suite specifically tailored for API Dash? Is forking the MCP Inspector repo and using it as a base considered a good approach? Looking forward to your thoughts! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @animator , following up from #1198. I’ve been digging into the DashBot logic and current tool-calling flow. I'd like to prototype a Provider-Adapter for the transport layer. This would keep the UI stable while switching between stdio and WebSocket servers. I noticed in #1173 that error handling for malformed calls could be more robust, so I’m also ready to start on a JSON-RPC validation layer. This ensures DashBot handles timeouts or schema mismatches gracefully during testing. Should I focus on this core transport logic first, or is there a higher priority for the testing interface?? |
Beta Was this translation helpful? Give feedback.
-
|
Hi everyone Previously, I built an MCP-based AI cloud orchestration system (MCPilot) using Python, Flask, Docker, Apache CloudStack APIs, and Claude AI. In that project, I designed an MCP layer that automated multi-step VM provisioning workflows and reduced manual effort significantly. For MCP Testing, I’m particularly interested in exploring how we can design an intelligent validation layer that understands API specifications and workflows, generates structured test strategies (including edge cases), and executes multi-step tests while maintaining context. I’d like to begin by contributing to smaller issues to better understand the codebase, and then gradually propose improvements for strengthening MCP server/client testing. Please let me know where I can start contributing. |
Beta Was this translation helpful? Give feedback.
-
|
Hi, I researched and ideated on this idea. Here's the summary of the idea: The idea is basically about making AI tools safer before they are used in the real world. It works by first using an automated “attacker” (Red Agent) to try and trick or break the system, especially by exploiting the AI’s tendency to be overly helpful. We will be feeding the attacking agent with the tool description and information and it will generate attacks accordingly. The results of the failed and secured attacks will be recorded and stored. Tools will have a policy that can be set related to their usage. We will be autonomously setting this policy to prevent any attacks. A "defender" agent will analyze the result and will be writing policies for the tool to prevent the vulnerabilities. Then it tests the system again to make sure the fix actually works. Finally, it adds a layer that checks whether every action the AI takes truly matches the user’s original intent. Overall, it turns the AI from being blindly helpful into being careful, secure, and aligned with user goals.. Here are the link of the research papers: Please give your inputs related to this ideation and how it can be improved upon. Aarav Dudeja |
Beta Was this translation helpful? Give feedback.
-
|
Hi @animator — This is what i have thought and researched The core problem I identified: The MCP ecosystem has grown extremely fast — 97M monthly SDK downloads and 10,000+ active servers — yet the developer tooling around testing is almost nonexistent. Right now most developers test MCP servers by manually connecting to Claude Desktop and typing prompts, with:
Victor Dibia also pointed out earlier this year that MCP testing options are “severely limited”, with poor debugging and developer experience. Even the best existing tool — MCP Inspector — is mainly a development inspector and lacks features for team testing, historical tracking, or CI/CD integration. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @animator and mentors! Key components I'm thinking of: Question for mentors: For the transport layer, should I prioritize stdio first or HTTP/SSE first — or design an abstracted transport interface that supports both from day one? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @animator! I'm very interested in GSoC 2026 Idea #1 - MCP Testing. After exploring the thread and researching the MCP ecosystem, here are my initial thoughts: The biggest gap I see is not just protocol visibility (which MCP Inspector covers) but structured test generation and validation ,specifically:
I'm particularly interested in the assertion engine side , using schema-driven validation to catch edge cases that manual testing misses. I've started setting up API Dash locally and exploring the codebase. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @animator, I went through the MCP Testing discussion and also spent time understanding how API Dash is structured today before thinking about an initial direction here. One thing that stands out to me is that a very broad “MCP DevTools” vision can become proposal-heavy quite quickly, so I think the most useful starting point would be a smaller but strong first slice that solves real developer pain during MCP development. My current view is that the first milestone should focus on a repeatable MCP testing core, not just an inspector-like live debugging flow: transport abstraction so different MCP server transports can be tested through one execution layer, protocol validation for malformed calls, schema / JSON-RPC mismatches, and timeout/error cases, test session execution + artifact capture so runs become reproducible and comparable, and then a developer-facing UI for defining/rerunning scenarios and inspecting failures. I think this would differentiate the project in two ways: from a pure inspector/debugger workflow, because the goal becomes repeatable testing, and from an overly broad all-at-once platform design, because the first milestone remains implementation-friendly and easy to validate. Since API Dash already emphasizes structured workflows, persistence, and testing in its current architecture, I feel MCP Testing should extend that philosophy into the MCP ecosystem rather than begin as a standalone mockup-heavy tool. I’m continuing to study the codebase and would love to discuss whether the initial priority should be: a Node/TypeScript execution + validation engine first, or a thin end-to-end vertical slice with both engine and minimal UI together. |
Beta Was this translation helpful? Give feedback.
-
|
Hi everyone , @animator 👋 I’ve been exploring the Model Context Protocol ecosystem and I find the idea of strengthening the MCP developer tooling very interesting. One idea that could improve the developer experience would be to build an MCP Testing Playground where developers can connect to an MCP server and interactively test the exposed tools. The interface could allow users to:
Additionally, a CLI-based MCP test runner could be useful for automation. Developers could define test cases in JSON/YAML and run them against an MCP server to validate tool behavior. Example workflow:
I’m currently reading more about MCP implementations and would love to hear how the maintainers envision the testing workflow for MCP servers within the API Dash ecosystem. |
Beta Was this translation helpful? Give feedback.
-
|
Hey everyone! 👋 My name is Mouhamed Diallo, a Junior CS student at DAUST (Dakar American University of Science and Technology) in Senegal. I'm applying for GSoC 2026 and this project immediately caught my attention. I work with MCP daily through Cline as part of my development workflow, so I've seen firsthand how powerful the protocol is — but also how frustrating it can be to debug tool calls when something goes wrong, with no structured way to verify what's happening under the hood. That pain point is exactly what makes this project feel genuinely needed. That said, I want to be honest: while I have a working understanding of MCP as a consumer (connecting to servers, invoking tools), I'm still getting familiar with the deeper internals of the protocol — the transport layer negotiation, the full JSON-RPC message lifecycle, and how a well-architected client/server test harness should be structured. A few questions that would really help me scope this properly:
I've gone through the contribution guidelines, the developer guide, and the code walkthrough video, and I've read the MCP Apps resource linked in the idea. I'm excited to dig deeper — just want to make sure I'm building in the right direction before drafting the full proposal. Thanks for putting this idea together — looking forward to the conversation! 🙏 |
Beta Was this translation helpful? Give feedback.
-
|
I went through all comments in this thread and did a quick repo scan before drafting my proposal direction. What I think is the highest-impact gap for API Dash Project #1 is transport-parity + deterministic replay, not just interactive inspection. Proposed core slice (Node + TypeScript backend, React UI):
Why this matters: the thread already discusses inspector-like UX and protocol visibility. The missing piece is reproducible pass/fail behavior across transports with the same scenario inputs, which is what teams need for regression testing. I also read Ashita Prasad’s MCP Apps article (https://dev.to/ashita/a-practical-guide-to-building-mcp-apps-1bfm). It highlights host-level behaviors that should be testable explicitly, especially ui/initialize handshake, hostCapabilities handling, and tools/call/UI metadata behavior. I think MCP App checks should be first-class assertions, not only visual preview. One specific scope question for mentors (Ankit/Ashita): for MVP, should transport parity target stdio + Streamable HTTP (with SSE fallback) from day one, or should we lock MVP to stdio first and add HTTP/SSE in phase 2 once the scenario runner + conformance classifier are stable? I can start by implementing the transport-agnostic scenario schema and failure taxonomy first, then wire adapters incrementally. |
Beta Was this translation helpful? Give feedback.
-
|
Hi everyone! I'm Gaurav, a first-year MCA student from India. My background is strong in backend development, DevOps, testing infrastructure, and workflow automation using n8n. I have experience with Python, Node.js, React, and working a lot with RESTful APIs. I am applying for GSoC 2026 and am very interested in Idea #1: MCP Testing. I recently built integration testing infrastructure for the Django framework. I am excited about creating strong testing features for MCP servers and clients using Python and Node. I am currently reviewing the MCP specification and looking into Issue #1225 to understand the current testing issues. I look forward to contributing! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @animator and mentors 👋 Built a config-driven RAG evaluation pipeline with automated regression testing via pytest + GitHub Actions, benchmarking retrieval metrics (Recall@K, MRR) across multiple configurations I'm currently exploring the API Dash codebase and will be submitting a proposal draft shortly. Happy to discuss my approach and contribute a small PR in the meantime. |
Beta Was this translation helpful? Give feedback.
-
|
Hi, I'm Nazeef. Applying for the MCP Testing project. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @animator and everyone! I've been following this discussion and wanted to share something I've been thinking about — should the testing engine be designed CI-first or UI-first? My initial instinct was to start from the UI side, connecting to a server and browsing tools visually. But after actually building a prototype, I changed my mind. I think the engine needs to be the core product, and the UI is just a viewer on top of it. The reasoning is simple: if the engine can run headless from day one, CI integration becomes a midterm deliverable instead of something you scramble to add in the last week. The project delivers real value even if the UI takes longer than expected. And every test becomes reproducible — just a CLI command with clear output, no manual clicking required. It also means the engine can live as a standalone package, which fits nicely with how API Dash already structures things (curl_parser, postman, genai as separate packages). I put together a TypeScript prototype to test this idea — no MCP SDK, just raw stdio transport and JSON-RPC from scratch. It connects to any MCP server, runs 19 conformance tests across 5 categories (Protocol, Discovery, Schema, Execution, Edge Cases), and reports pass/fail with proper exit codes: github.com/vinimlo/mcp-conformance One thing I think could really tie the whole system together is a recording + replay pipeline: the conformance engine records real JSON-RPC traffic to fixture files, and then a mock server replays those fixtures for deterministic client-side testing. One recording flow that serves both server testing and client testing. Here is the overall architecture I am proposing: Also, after reading the MCP Apps practical guide, I am convinced that MCP Apps conformance testing (ui:// resource validation, ui/initialize handshake, hostContext CSS injection) should be part of the core scope rather than a stretch goal. Would love to hear everyone's thoughts on this! |
Beta Was this translation helpful? Give feedback.
-
|
Hi, I'm Aniruddh, a BTech 3rd year student applying for GSoC 2026. I'd like to work on the MCP JSON-RPC Validation Layer. I've reviewed the codebase and plan to implement a type-safe validation engine in TypeScript using Zod, following the project's pattern of structured validation seen in validation_utils.dart. I also plan to integrate a layer-based error classifier to provide clear diagnostics for transport and protocol-level failures. Could you assign this to me? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @animator, submitted my PoC to gsoc-poc (PR #15). Python test client that connects directly to the sample-mcp-apps-chatflow server, runs JSON-RPC conformance checks, tool/resource discovery, and actual tool calls using the sales data. Went with Python first to nail the logic, working on a TypeScript version next to match the stack. Quick scope question: is the testing engine focused on server-side validation, or is client testing also in scope from the start? |
Beta Was this translation helpful? Give feedback.





Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The Model Context Protocol (MCP) acts as the API layer of the AI world, defining a standard way for AI agents to discover, understand, and interact with tools, data, and software systems - much like REST or GraphQL do for traditional applications.
In this project, your task is to strengthen the MCP Developer ecosystem by designing and building the capability to create & test MCP servers and clients.
Also, go through this resource to understand an MCP Apps server so that you can incorporate its testing in the project.
Project Tech Stack - React, Node, TypeScript
Please feel free to share your ideas/research below.
Beta Was this translation helpful? Give feedback.
All reactions