Headless MCP server for ZeroEval: inspect traces, manage judges and prompts, submit feedback, run optimizations, and safely deploy to production.
Choose your preferred MCP client
Run this command in your terminal:
claude mcp add --transport http "zeroeval-mcp" https://mcp.zeroeval.com/mcp
Click the button below to add this MCP server to Cursor:
Open in CursorOr add manually: Settings → MCP → Add server
Click the button below to add this MCP server to VS Code:
Open in VS CodeOr add manually: Settings → MCP → Add server
Click the button below to add this MCP server to VS Code Insiders:
Open in VS Code InsidersOr add manually: Settings → MCP → Add server
list-tracesList recent traces for the current project. Returns normalized trace objects.
get-traceGet a single trace by ID, including its spans by default.
list-judgesList all judges (signal automations) in the current project.
get-judgeGet a single judge by ID, including linkage state and task reference.
list-judge-evaluationsList evaluations produced by a judge, with optional date and result filters.
get-judge-criteriaGet the scoring criteria for a judge.
create-judgeCreate a new judge (signal automation) in the current project. Requires user confirmation.
link-judge-to-promptLink a judge to a prompt so it only evaluates that prompt's spans and auto-writes feedback for optimization. Requires user confirmation.
unlink-judge-from-promptRemove a judge's link to a prompt so it evaluates all matching spans again. Requires user confirmation.
create-judge-feedbackSubmit feedback on a judge evaluation by span ID. Requires user confirmation.
list-promptsList all prompts in the current project.
get-promptGet a prompt by slug, optionally at a specific version or tag.
list-prompt-versionsList all versions of a prompt.
create-prompt-feedbackSubmit feedback on a prompt completion. Requires user confirmation.
list-optimization-runsList optimization runs for a task. Use prompt slug or judge ID to find the task first.
get-optimization-runGet details of a specific optimization run including candidate prompt and metrics.
get-project-summaryGet a high-level summary of the current project's monitoring data.
start-prompt-optimizationStart an optimization run for a prompt task. Requires user confirmation.
start-judge-optimizationStart an optimization run for a judge. Resolves the judge's task ID automatically. Requires user confirmation.
cancel-optimization-runCancel a running optimization. Requires user confirmation.
preview-optimization-deployPreview what deploying an optimization run would do. Returns a confirmation receipt required by deploy-optimization-run.
deploy-optimization-runDeploy a successful optimization run to production. Requires a valid receipt from preview-optimization-deploy and user confirmation.
investigate-prompt-issuesRead-only evidence assembler: gathers prompt versions, recent optimization runs, and feedback availability. Returns a summary and recommends the next primitive tool call.
investigate-judge-issuesRead-only evidence assembler: gathers judge state, evaluations, criteria, and linkage. Returns a summary and recommends the next primitive tool call.
prepare-prompt-optimizationRead-only proposal: analyzes prompt state and returns the exact start-prompt-optimization call to make. Does NOT start the run.
prepare-judge-optimizationRead-only proposal: resolves a judge's task ID and returns the exact start-judge-optimization call to make. Does NOT start the run.
list-issuesList monitoring issues for the current project. Returns detected problems from judges and deterministic detectors.
get-issueGet a single monitoring issue by ID, including linked entity references and occurrence data.
investigate-issueRead-only evidence assembler for monitoring issues: fetches the issue, its linked trace/span context, and judge evaluation context. Returns a compact evidence summary with recommended next actions.
config://server-contextserver_contextServer configuration and connection status. In request_header mode, auth comes from the Authorization header.
docs://capabilitiescapabilitiesCanonical tool and resource inventory with annotations.