We ran our first test against Apollo MCP, and it went better than I expected

Short version: On a real federated graph with several thousand types, gqlens's search surfaced the correct answer path in its very first response. Apollo MCP's search missed a key branch of the schema on the same question, and the agent had to drill down manually to recover it. Both tools work. They work differently. This is my first real test, and honestly it went better than I expected.

Why I ran this

I launched gqlens this week. Apollo has shipped their own GraphQL MCP server for a while. Since I'm claiming gqlens is better at a specific thing, helping AI agents navigate a large schema they have never seen, I owe you a direct comparison with real numbers rather than adjectives.

To be upfront: this is the first time I've sat down and benchmarked the two tools side by side. I wasn't sure what I would find. One test is not a benchmark, and I'll get to what a real benchmark looks like at the end of this post. But the result was clear enough that I wanted to share it while it's fresh.

The setup

Schema: a production federated GraphQL graph with roughly 2,000 types across several hundred domains. Enterprise supply-chain domain. The specific graph is anonymized, and type and field names in this post have been renamed while preserving the shape, depth, and behavior of the real queries.
Client: Claude Opus via Claude Code, with both MCP servers registered at the same time so the model could pick either tool freely.
Tools: both servers exposed their default tools (search, introspect, validate, and so on).
Questions: two realistic developer questions against a graph the assistant had not seen before.

Neither MCP was pre-loaded, hinted at, or prompt-engineered. I asked the question and watched what each tool did.

Question 1: "How do I get current inventory for my items on a site?"

A typical week-one question from a developer new to the codebase. The answer lives several levels deep, under an analytics sub-tree.

gqlens returned a search response that ranked the correct sub-tree near the top, including the fully-qualified path to the field the question was really about. The agent introspected a couple of return types, built the query, and validated it.

Apollo MCP returned a search response with ranked types at the top of the graph, but the specific sub-tree where the answer lived did not surface. The agent had to guess at likely parent types and introspect each one until it found the branch, then continue down to the leaf.

Both tools reached a valid query in the end. gqlens did it in fewer round trips. Apollo got there after an introspection chase.

Question 2: "Under warehouse, there should be a way to get inventory"

A more precise follow-up. The developer has a hunch where the field lives but not the exact name. This is the case where the tools diverged most clearly.

What gqlens search returned

Among other results, the search call surfaced this section directly:

D:10
 query.warehouse.analytics.inventory.currentInventory [match:currentInventory]
 query.warehouse.analytics.inventory.availableInventory [match:availableInventory]

That is the answer. A fully-qualified path, returned by the first search call, without the agent having to guess which intermediate type to introspect. From there the agent introspected the leaf return type, built the query, and validated it. Five tool calls end to end.

What Apollo MCP search returned

Apollo's search returned a type summary for WarehouseQuery, the entry point the question asked about. The field the agent actually needed, analytics: WarehouseAnalyticsQuery!, was not in the ranked results.

I confirmed via a follow-up introspect call with depth=2 that the field does exist in the schema. Apollo's search simply didn't surface it. To reach the leaf, the agent had to introspect WarehouseQuery, then WarehouseAnalyticsQuery, then InventoryAnalyticsQuery, then the return type. About the same total tool calls as gqlens, but with more guessing at each step.

Side by side

Metric	gqlens	Apollo MCP
Search surfaced the exact answer path	Yes, in deep matches	No, the `analytics` branch was missing
Agent had to guess intermediate types	No	Yes
Introspection calls after search	3	3
Total tool calls end to end	5	about 6
Final validated query	Same	Same

The raw call counts are close. The thing that matters, to me, is whether the first search response already contains the answer or whether the agent has to hunt for it. On this question gqlens's search did. Apollo's did not.

The final query (validated against the schema on both servers)

query CurrentInventoryUnderWarehouse($siteId: String!) {
  warehouse {
    analytics {
      inventory {
        currentInventory(siteId: $siteId) {
          siteId
          skuId
          quantity { value unit }
          locations {
            locationId
            quantity { value unit }
          }
        }
      }
    }
  }
}

Both tools can build this. Only one surfaced the path without the agent guessing.

Why the gap exists

Apollo's MCP search ranks types and fields by name and description match. That's a solid baseline, and it works well on small schemas and shallow queries. It breaks down when the answer lives three or four hops into a subtree and nothing in the top-level type name hints at the branch below.

gqlens's search does one extra thing: it traverses the schema graph from the root operation types and returns fully-qualified paths whose leaf field matches the query. query.warehouse.analytics.inventory.currentInventory shows up because currentInventory matches the intent, even though nobody searched for "warehouse."

That extra pass is what earns its keep on large federated graphs. Without it, the agent has to guess intermediate type names before it can reach the leaf, and that's exactly what turned a one-call question into a multi-call scavenger hunt.

What this test does not prove

A few honest caveats, because one test is not a benchmark:

One schema is not enough. I ran this on one graph in one domain. Another schema may have a shape where the gap closes or widens.
Two questions is not a question set. Real developer work includes shallow lookups, mutations, cross-domain joins, and questions where the correct answer is "this does not exist." We only covered one shape.
One model is not a population. This ran on Claude Opus. A smaller model or a different vendor may route tool calls differently.
Apollo MCP does other things well. Execution tooling and native integration with the Apollo GraphOS stack are more mature than anything gqlens ships today. If your agent mostly executes known queries against a graph your team already understands, that's Apollo's strength, not mine.
Apollo can reach the same answer. It took more introspection, but nothing was unreachable. Please don't read this post as "Apollo doesn't work." It works. It's tuned for a different starting point.

What we're building next: a real benchmark

This post is the first data point, not the last word. The next thing I'm working on is a proper evaluation:

Multiple federated schemas of different shapes and domains (supply chain, retail, fintech, media) so the result is not a one-graph artefact.
A curated question set with graded difficulty: shallow lookups, deep subtree traversal, enum and filter questions, cross-domain joins, and questions where the correct answer is "this does not exist in the schema."
Metrics beyond tool-call count. First-try compile rate, hallucinated-field rate, time to valid query, and total token cost are the ones I care about most.
Public methodology and dataset. If the benchmark is worth anything, anyone should be able to re-run it, add their own schema, and publish their own numbers.

If you run a GraphQL API and are open to letting me include your schema (public or under NDA), I would love to hear from you. Same if you have strong opinions about the question set. A benchmark is only useful if the questions reflect the work developers actually do.

Try it yourself

Both MCPs are free to install:

Apollo MCP: follow Apollo's docs for setup against your graph.
gqlens: add your GraphQL source at gqlens.com, copy the generated MCP URL into your client. No local install, no config file.

Register both at the same time. Ask your AI assistant a question that lives several hops deep in a subtree. Count the tool calls. Look at whether the correct path shows up in the first search response. That's the signal worth watching.

Where this takes us

The schema-understanding problem is not solved. Execution is the easy half. There are a dozen ways to POST a GraphQL document once you know what to send. The hard half is the first call, when the agent is staring at a 30,000-line schema it has never seen and has to decide which three fields to ask about.

That's the half gqlens is built around. If you're running an MCP against a small, stable, well-known schema, the choice does not matter much. If you're running one against a large federation your team is still learning, try both and watch the tool-call traces. The difference is visible in minutes.

If you do run this on your own graph, send me the results. I'll either share the numbers publicly or fix whatever the test exposes. And when the proper benchmark is ready, you'll be the first to see it.