zeroeval-mcp (v1.0.0)

Headless MCP server for ZeroEval: inspect traces, manage judges and prompts, submit feedback, run optimizations, and safely deploy to production.

https://mcp.zeroeval.com/mcp

Installation Guide

Choose your preferred MCP client

Install in Claude Code

Run this command in your terminal:

claude mcp add --transport http "zeroeval-mcp" https://mcp.zeroeval.com/mcp

Install in Cursor

Click the button below to add this MCP server to Cursor:

Open in Cursor

Or add manually: Settings → MCP → Add server

Install in VS Code

Click the button below to add this MCP server to VS Code:

Open in VS Code

Or add manually: Settings → MCP → Add server

Install in VS Code Insiders

Click the button below to add this MCP server to VS Code Insiders:

Open in VS Code Insiders

Or add manually: Settings → MCP → Add server

Connect with ChatGPT

  1. Enable Developer Mode: Settings → Connectors → Advanced → Developer mode
  2. Import this MCP server: Go to Connectors tab and add: https://mcp.zeroeval.com/mcp
  3. Use in conversations: Choose the MCP server from the Plus menu

Primitives

Tools (29)

list-traces

List recent traces for the current project. Returns normalized trace objects.

get-trace

Get a single trace by ID, including its spans by default.

list-judges

List all judges (signal automations) in the current project.

get-judge

Get a single judge by ID, including linkage state and task reference.

list-judge-evaluations

List evaluations produced by a judge, with optional date and result filters.

get-judge-criteria

Get the scoring criteria for a judge.

create-judge

Create a new judge (signal automation) in the current project. Requires user confirmation.

link-judge-to-prompt

Link a judge to a prompt so it only evaluates that prompt's spans and auto-writes feedback for optimization. Requires user confirmation.

unlink-judge-from-prompt

Remove a judge's link to a prompt so it evaluates all matching spans again. Requires user confirmation.

create-judge-feedback

Submit feedback on a judge evaluation by span ID. Requires user confirmation.

list-prompts

List all prompts in the current project.

get-prompt

Get a prompt by slug, optionally at a specific version or tag.

list-prompt-versions

List all versions of a prompt.

create-prompt-feedback

Submit feedback on a prompt completion. Requires user confirmation.

list-optimization-runs

List optimization runs for a task. Use prompt slug or judge ID to find the task first.

get-optimization-run

Get details of a specific optimization run including candidate prompt and metrics.

get-project-summary

Get a high-level summary of the current project's monitoring data.

start-prompt-optimization

Start an optimization run for a prompt task. Requires user confirmation.

start-judge-optimization

Start an optimization run for a judge. Resolves the judge's task ID automatically. Requires user confirmation.

cancel-optimization-run

Cancel a running optimization. Requires user confirmation.

preview-optimization-deploy

Preview what deploying an optimization run would do. Returns a confirmation receipt required by deploy-optimization-run.

deploy-optimization-run

Deploy a successful optimization run to production. Requires a valid receipt from preview-optimization-deploy and user confirmation.

investigate-prompt-issues

Read-only evidence assembler: gathers prompt versions, recent optimization runs, and feedback availability. Returns a summary and recommends the next primitive tool call.

investigate-judge-issues

Read-only evidence assembler: gathers judge state, evaluations, criteria, and linkage. Returns a summary and recommends the next primitive tool call.

prepare-prompt-optimization

Read-only proposal: analyzes prompt state and returns the exact start-prompt-optimization call to make. Does NOT start the run.

prepare-judge-optimization

Read-only proposal: resolves a judge's task ID and returns the exact start-judge-optimization call to make. Does NOT start the run.

list-issues

List monitoring issues for the current project. Returns detected problems from judges and deterministic detectors.

get-issue

Get a single monitoring issue by ID, including linked entity references and occurrence data.

investigate-issue

Read-only evidence assembler for monitoring issues: fetches the issue, its linked trace/span context, and judge evaluation context. Returns a compact evidence summary with recommended next actions.

Resources (2)

config://server-contextserver_context

Server configuration and connection status. In request_header mode, auth comes from the Authorization header.

docs://capabilitiescapabilities

Canonical tool and resource inventory with annotations.