How Ready Is Your
Website for AI Agents?
Submit your site and our experts will analyze your AI agent readiness, then walk you through personalized recommendations on a free call — no strings attached.
Get Your Free Analysis
Our experts will analyze your AI agent readiness and walk you through personalized recommendations — completely free.
$ npx web-mcp audit yoursite.comScoring Framework
Five Dimensions. One Score.
Your Agent Readiness Score is a weighted composite of five categories — each mapped directly to the WebMCP protocol specification. No black boxes. Every point measures something real.
What We Check
- Presence of navigator.modelContext.registerTool() calls or declarative HTML form attributes (toolname, tooldescription)
- inputSchema completeness: every parameter has a type, description, and appropriate constraints
- Description field quality: specific action verbs, positive framing, clear scope
- Required vs. optional parameter classification
- provideContext() usage for dynamic tool availability in SPAs
- Feature detection guards: if ("modelContext" in window.navigator)
Business Impact
If this score is low, AI agents literally cannot see or use your tools. It's like having a store with no door.
What We Check
- 50–200 natural language prompts generated per tool, covering formal, casual, multilingual, and ambiguous phrasings
- Each prompt tested against Gemini, GPT, and Claude simultaneously via function-calling
- Tool routing accuracy: did the model pick the correct tool?
- Parameter extraction accuracy: did it fill the right values?
- Hallucination rate: did the model invent parameters not in the prompt?
- Disambiguation: when tools have similar descriptions, did the model get confused?
Business Impact
This is the difference between "my tools exist" and "my tools actually get used." A site with perfect Implementation but 40% Prompt Coverage means agents find your tools but fail to use them correctly 60% of the time.
What We Check
- Tool poisoning / injection surface: scans descriptions for hidden instructions, Unicode tricks, and unusually long descriptions
- Over-parameterization: classifies parameters by sensitivity and flags unnecessary High/Critical parameters
- Misrepresentation of intent: detects gaps between tool description and execute function behavior
- Missing requestUserInteraction(): flags destructive or financial tools without human approval
- Output injection risk: checks if execute functions return unsanitized user-generated content
Business Impact
A single security vulnerability can expose customer data, enable unauthorized purchases, or allow prompt injection attacks. For regulated industries, this component determines whether compliance teams approve WebMCP adoption.
What We Check
- Error handling: execute functions return structured error content, not unhandled exceptions
- Execution timing: P50/P95/P99 response times. Tools >3 seconds degrade agent experience
- SubmitEvent.agentInvoked handling: forms detect agent-initiated submissions correctly
- CSS pseudo-class support: :tool-form-active and :tool-submit-active for visual feedback
- toolactivated / toolcancel event handling for lifecycle management
- Graceful degradation: tools function when WebMCP is unavailable
Business Impact
Reliability failures are invisible until they happen in production. A 5% failure rate across thousands of daily agent interactions means hundreds of broken experiences.
What We Check
- Naming conventions: specific action verbs (search-flights, book-hotel) not vague ones (handle-request)
- Execution vs. initiation clarity: description states whether tool acts immediately or starts a process
- Atomic tool design: each tool does one thing. Complex operations composed from multiple tools
- Schema design: tools accept raw user input (city names, not airport codes)
- Annotation completeness: readOnlyHint, destructiveHint, idempotentHint, openWorldHint
- Context management: provideContext() used on state transitions, stale tools cleaned up
Business Impact
Best practices don’t affect whether your tools work today — they affect whether they work well across different AI models, whether they’re maintainable as your site evolves, and whether agents can compose them into multi-step workflows.
Illustrative example — scores vary by site
Each component is scored independently on a 0–100 scale, then combined using the weights above to produce your composite Agent Readiness Score.
Score Interpretation
What Your Score Means
Your score isn’t just a number — it predicts how AI agents will behave on your site. Lower scores mean agents struggle, fail, or choose your competitors instead.
What It Means
No meaningful WebMCP implementation. No registerTool() calls detected, no declarative form attributes, or critical schema errors prevent any tool from being usable.
What Agents Do
Agents bypass your site entirely. When a user asks their AI assistant to interact with your service, the agent either scrapes your UI (unreliable, slow) or routes to a competitor that has proper tools.
What You Should Do
Install the Web-MCP CLI (npx webmcp-cli audit) and follow the Quick Start guide. Most sites jump to 40+ within a day of adding basic tool registrations.
What It Means
Tools exist but have significant gaps. Common issues: missing parameter descriptions, vague tool names, incomplete schemas, no error handling in execute functions.
What Agents Do
Agents find your tools but frequently fail. They select the wrong tool, fill parameters incorrectly, or encounter errors mid-task. Users retry manually.
What You Should Do
Focus on the recommendations in your score report. Most issues are fixable in hours: add descriptions, specify parameter types, use explicit naming. Run npx webmcp-cli lint for instant wins.
What It Means
Solid foundation. Tools work for common use cases. Gaps appear in edge cases: multilingual prompts, ambiguous requests, adversarial inputs. Security may have unreviewed exposure.
What Agents Do
Agents succeed on straightforward tasks but fail on complex or unusual requests. You’re functional but not optimized — agents may prefer a competitor’s tools when both are available.
What You Should Do
Run multi-model prompt coverage testing to find specific phrasings that fail. A/B test your tool descriptions. Review security scan findings. You’re ahead of ~60% of sites.
What It Means
Well-implemented across all five dimensions. High prompt coverage across models, solid security posture, reliable execution. Top quartile.
What Agents Do
Agents reliably complete tasks on your site. In competitive scenarios, you win most of the time. Multi-step workflows succeed consistently.
What You Should Do
Use competitive benchmarking to track your position vs. specific competitors. Set up continuous monitoring to catch regressions. Display your Agent Readiness Badge.
What It Means
Top-tier implementation. Agents prefer your tools over alternatives. Comprehensive coverage across all models, languages, and edge cases. Robust security. Exemplary spec compliance.
What Agents Do
Agents actively prefer your site. When presented with similar tools from multiple sources, agents select yours due to superior descriptions, schema quality, and reliability history.
What You Should Do
You’re setting the standard for your industry. Publish your score. Get Gold Certified. Monitor for regressions with CI/CD integration.
Your score updates every time you scan. Consistent implementation work typically leads to meaningful improvement over time.
Two Tiers
Free Instant Audit. Or the Full Picture.
Quick Score tells you where you stand. Deep Score tells you exactly why — and precisely how to win.
Quick Score
~60 seconds
Static analysis only. Loads your page in a headless browser, detects the WebMCP API, parses tool registrations, validates schemas, runs security heuristics, and checks spec compliance. No LLM calls.
- •Full Implementation analysis
- •Estimated Prompt Coverage from heuristics
- •Static injection surface scan
- •Inferred Reliability from code patterns
- •Full Best Practices check
- •General improvement guidance
- •Your score vs. industry average
- •Single page scan
Deep Score
3–8 minutes
Everything in Quick Score plus: generates 50–200 natural language prompts per tool, tests each across Gemini, GPT, and Claude simultaneously, measures routing accuracy and hallucination rates, and executes tools in a sandboxed browser.
- •Full Implementation analysis
- •Actual LLM testing across 3 models
- •Full security scan + dynamic analysis
- •Actual tool execution in sandbox
- •Full spec check + behavioral verification
- •Code-level fixes with before/after examples
- •Industry percentile + competitor comparison
- •Multi-page crawl across your site
- •Full history with trend analysis and alerts
87% of users who run a Quick Score come back for the Deep Score within a week.
Full Transparency
How We Calculate Your Score
No black box. Every check maps to the WebMCP specification. Here’s exactly what we measure, how we measure it, and why it matters.
Implementation
25% of total score| Check | What We Check | Impact | Tier |
|---|---|---|---|
| IMPL-01 | WebMCP API presence | Critical — gating check | Free |
| IMPL-02 | Tool registration method | High — at least one method required | Free |
| IMPL-03 | inputSchema completeness | High — per-parameter scoring | Free |
| IMPL-04 | Description quality | Medium — impacts Prompt Coverage heavily | Free |
| IMPL-05 | Parameter descriptions | Medium | Free |
| IMPL-06 | provideContext() usage | Medium — critical for SPAs | Free |
| IMPL-07 | Feature detection | Low — critical for production | Free |
Spec references link to the WebMCP protocol documentation when publicly available.
Projected Benchmarks
Your Score in Context
Scores shown are projected benchmarks based on our scoring methodology applied to representative sites in each industry. Actual industry averages will update as more sites are scored.
Why Tool Quality Matters
When an AI agent has access to tools from multiple sites simultaneously — your flight search and a competitor’s — it doesn’t flip a coin. It evaluates tool quality: description clarity, schema completeness, parameter precision, and historical reliability.
Better-defined tools are more likely to be selected by AI agents.
In preliminary testing, LLMs consistently prefer tools with clearer descriptions and better parameter naming. Higher-scored implementations correlate with more reliable agent interactions.
Benchmarks are illustrative projections, not aggregate data from scored sites.
Continuous Monitoring
Track Every Point of Progress
Your score isn’t a one-time snapshot. It’s a living metric that updates as you implement changes, and alerts you the moment something regresses.
Score Trend (8 Weeks)
Milestones
Automatic Rescanning
Score recalculated weekly (Pro) or on-demand. Tracks every change.
Regression Detection
If your score drops even 1 point, you get an alert explaining what changed.
Milestone Tracking
Every score change logged with the cause and exact point impact.
Component Trends
Five independent trend lines show which areas are improving or stagnating.
Your score updates every time you scan. Track improvements as you implement changes and optimize tool definitions.
Trend data shown above is an illustrative example, not real aggregate data.
Actionable Fixes
Your Score Comes With a Roadmap
Every point deducted maps to a specific issue with a specific fix. We don’t just tell you what’s wrong — we tell you exactly how to make it right.
Public Proof
Certify Your Agent Readiness
Display your score on your website. Show visitors, partners, and AI agents that your site is built for the agentic web.
<!-- Web-MCP Certified Pro Badge -->
<a href="https://web-mcp.net/score/yoursite.com"
target="_blank" rel="noopener">
<img src="https://badge.web-mcp.net/yoursite.com"
alt="Agent Readiness Score: Certified Pro"
width="200" height="40" />
</a>For Your Site
Signals to partners and customers that you're prepared for the AI agent era. Differentiates you from competitors.
For Agencies
Every client site with a badge is a portfolio piece. Certification becomes a tangible deliverable you can offer clients.
For the Ecosystem
Public scores create accountability and transparency. Certified sites can be listed in the Discovery Hub.
Developer-First
Your Score, Wherever You Work
Terminal. Browser. CI pipeline. REST API. The Agent Readiness Score integrates into every developer workflow.
$ npx webmcp-cli audit https://yoursite.com Web-MCP.net v1.0.0 — Agent Readiness Score Scanning https://yoursite.com... ✓ WebMCP API detected (navigator.modelContext) ✓ Found 7 registered tools (5 imperative, 2 declarative) ✓ Validating schemas against JSON Schema draft-07... ✓ Running security analysis... ✓ Checking best practices compliance... ┌───────────────────────────────────────────┐ │ AGENT READINESS SCORE: 73 / 100 │ │ ══════════════════════════════ │ │ Implementation: 85 ████████░░ (+12) │ │ Prompt Coverage: 72* ███████░░░ (est) │ │ Security: 68 ██████░░░░ │ │ Reliability: 71 ███████░░░ │ │ Best Practices: 61 ██████░░░░ │ │ │ │ Industry: E-Commerce • Top 22% │ │ 3 Critical issues found │ └───────────────────────────────────────────┘
Install globally or use npx. Get your score from the terminal in seconds.
Capabilities
npx webmcp-cli audit — Full Quick Scorenpx webmcp-cli lint — Schema linting with auto-fixnpx webmcp-cli audit --deep — Deep Score with LLM testingnpx webmcp-cli compare <url> — Side-by-side comparisonnpx webmcp-cli ci --min-score 75 — CI/CD mode
All interfaces share the same scoring engine. Your score is consistent whether you check from the terminal, browser, or API.
For You
Built for Every Team That Touches the Web
For Developers & Engineering Teams
Your Question
“What exactly does the score measure, and can I trust it?”
The Answer
Every check maps to the WebMCP specification. We validate your registerTool() calls, test your inputSchema against JSON Schema draft-07, scan for injection vulnerabilities documented in the WebMCP Security spec, and test prompt routing across Gemini, GPT, and Claude.
Your Workflow
- 1
npx webmcp-cli audit https://yoursite.com - 2
npx webmcp-cli lint - 3
npx webmcp-cli ci --min-score 75
Features That Matter
- Full methodology transparency
- CLI + API + CI/CD integration
- Code-level recommendations with before/after
- Schema linting with auto-fix suggestions
For CTOs, VPs & Executives
Your Question
“How do we compare to competitors, and what’s the business case?”
The Answer
Your Agent Readiness Score determines whether AI agents succeed on your site — or route to competitors. We benchmark you against your industry vertical and show your percentile ranking. Higher scores mean more agent traffic and more conversions.
Your Workflow
- 1
Run a Quick Score on your site and top 3 competitors - 2
Review industry benchmark report - 3
Present prioritized roadmap to engineering team
Features That Matter
- Industry benchmarks & competitive ranking
- Score trends with regression alerts
- Revenue attribution in Analytics module
- Exportable reports for board presentations
For Agencies & Consultants
Your Question
“Can I use this to win and deliver client engagements?”
The Answer
The Agent Readiness Score is your client deliverable. "Before Web-MCP: 18. After: 83." That screenshot goes on the invoice.
Your Workflow
- 1
Scan client site → Show them their score - 2
Present recommendations → Scope the engagement - 3
Implement fixes → Rescan → Show improvement - 4
Install badge → Set up monitoring → Retainer
Features That Matter
- Score as sales tool (scan prospects for free)
- Exportable PDF reports (white-label ready)
- Badge on every client site (portfolio + backlinks)
- Bulk scanning across client portfolio
WebMCP Is Live in Chrome 146.
See Where Your Site Stands.
Get a free expert analysis of your site's agent readiness and actionable recommendations to improve.
Get Your Free Analysis
Submit your site and book a free walkthrough call with our team.
$ npx web-mcpaudit yoursite.comInstall the browser extension. See scores as you browse.
Install Extension →