Skip to content
Blog

The Accuracy Gap: What AI Visibility Tools Don't Tell You

By Findcraft · Original Research · · 9 min read

A growing market of AI visibility tools and agencies — platforms like Otterly and Profound, agencies like GrowthX, and dozens more — helps businesses track whether AI mentions them. They monitor brand coverage across ChatGPT, Perplexity, Google AI, and Copilot. They count citations, track prompt rankings, and chart visibility trends over time.

They all share one assumption: more mentions are better.

But what if the mentions are wrong? What if ChatGPT is recommending your business with the wrong address, fabricated menu prices, or a location that doesn't exist? Every visibility tool in the market will tell you that you're being mentioned. None will tell you that what AI says about you is false.

This isn't a hypothetical concern. Lily Ray's research tracking 11 websites showed that AI citation patterns cascade directly from organic search signals — but the accuracy of what AI says once it cites a business is an entirely separate question that nobody was measuring. The hallucination detection market is already valued at $2.47 billion in 2026, growing at 33.2% CAGR (Precedence Research). The demand for accuracy verification exists. The tools for brand-specific accuracy checking do not — at least, not in the current visibility stack.

We call this the Accuracy Gap — the space between visibility and truth. It's the most important question in AI search that nobody's tool is designed to answer.

Diagram showing the Accuracy Gap: visibility tools monitor whether AI mentions a business, but miss whether what AI says is factually correct
The Accuracy Gap: visibility tools see the left side. Nobody sees the right — until now.

What we tested

On 14 April 2026, we ran two tools against the same London restaurant: Da Mario, a well-known Italian restaurant in South Kensington with a clear factual profile — specific address, published menu, established opening hours.

Tool 1: Otterly.ai — a leading AI visibility monitoring platform (14-day free trial, paid tiers from $29 to $489/month). We created a brand report for Da Mario and examined every dashboard section.

Tool 2: Our Accuracy Engine (M.A.R.C. methodology) — a three-tier accuracy verification system that extracts factual claims from AI responses and checks each one against verified ground truth data.

Same business. Same AI platforms. Same day. Completely different findings.

What Otterly sees

Otterly's dashboard for Da Mario showed exactly what a visibility tool is designed to show. Across four sections — Overview, Prompts, Citations, and Recommendations — it tracked:

  • 10 AI prompts monitored automatically (generated from the business URL)
  • 63 URLs cited by AI engines when answering related queries
  • Brand mention tracking: Da Mario had low visibility — "Brand Mentioned: No" on most prompts
  • Competitor tracking showing which other restaurants AI recommends instead

Otterly's conclusion: Da Mario has low AI visibility. Its recommendation: optimise for better coverage.

What was absent from every section of the dashboard: any column, metric, flag, or indicator for factual accuracy. No check for whether AI's claims about Da Mario are true. No hallucination detection. No error rate. This is consistent across all three paid tiers ($29, $189, and $489/month) — accuracy checking is not available at any price point.

What M.A.R.C. sees

The same business, examined for accuracy rather than visibility, revealed a different picture entirely.

M.A.R.C. Accuracy Score: 45/100. Our engine extracted 57 distinct factual claims from AI responses about Da Mario, verified 24 of them against ground truth data (42% verification coverage), and classified each one.

  • 8 claims verified as accurate — correct address, confirmed services
  • 6 inaccuracies identified — wrong breakfast times, incorrect details
  • 10 hallucinations — completely fabricated information with no basis in reality
  • 33 claims unverifiable — our ground truth data didn't cover these topics

The hallucinations were striking. AI fabricated a Covent Garden location for Da Mario — the restaurant is in South Kensington, and has never had a Covent Garden branch. AI invented specific breakfast service times. It made up menu prices for dishes that don't appear at those prices on the actual menu.

None of this appeared anywhere in Otterly's dashboard. The visibility tool saw Da Mario being mentioned. Our accuracy engine saw that when AI mentions Da Mario, a significant portion of what it says is wrong.

Side-by-side comparison: Otterly sees 10 prompts, 63 URLs, and low visibility. M.A.R.C. sees 57 claims, 6 inaccuracies, 10 hallucinations, and a score of 45 out of 100.
Same business, same day: Otterly tracks visibility. M.A.R.C. verifies accuracy.

We tested Da Mario. What is AI saying about your business? Find out free at marcscore.com.

The methodology works both ways

A fair test of any accuracy methodology needs to show it works in both directions — flagging problems where they exist and confirming accuracy where it doesn't.

Dishoom (Shoreditch, London) scored 88/100. Across 94 extracted claims, our engine found zero inaccuracies. Dishoom has strong entity authority — rich structured data, consistent directory listings, extensive reviews. AI represents it accurately because the ground truth is abundant and consistent.

The Ivy (London) scored 38/100. Across 72 claims, it had the worst accuracy of any business we tested — heavily hallucinated content from AI despite being a high-profile brand. Visibility was high. Accuracy was not.

Bar chart comparing M.A.R.C. Accuracy Scores: Dishoom 88 out of 100 with zero inaccuracies, Da Mario 45 out of 100 with 6 inaccuracies and 10 hallucinations, The Ivy 38 out of 100 with heavy hallucination
Three businesses, three accuracy profiles. Visibility tools would report all three as "mentioned" without distinguishing accuracy.

What the Accuracy Gap means

Visibility without accuracy is a liability.

Consider what happens when a business follows the standard advice: "Optimise for more AI coverage." If AI is already saying incorrect things about the business — wrong prices, fabricated locations, invented services — then more coverage means more people encountering wrong information. AI's representation of the business reaches more customers while remaining just as inaccurate.

The consequences are concrete. A customer told by ChatGPT that Da Mario has a Covent Garden branch arrives at an address that doesn't exist. A diner shown fabricated menu prices walks in expecting one bill and gets another. A customer told a restaurant is BYOB arrives without wine and blames the restaurant for the confusion. In each case, the business gets the complaint, not ChatGPT.

Incorrect pricing is particularly damaging. If AI consistently tells customers a service costs less than it does, every first conversation starts with a price mismatch. The business either honours the wrong price (losing margin) or corrects it (starting the relationship with a disappointment). Neither outcome is good, and neither is visible in a visibility dashboard. We've seen this pattern play out beyond AI: in our case study of an SEO audit gone wrong, acting on inaccurate information led to measurably worse outcomes than doing nothing at all.

These errors also compound. AI platforms learn from each other via web scraping, citation chains, and training data. A hallucination that appears in one platform's response can propagate to others. The fabricated Covent Garden location for Da Mario didn't appear on just one platform — once one AI invents a "fact," others may treat it as established information. One wrong claim can become five wrong claims across five platforms within months.

This matters more as AI-referred customers become a larger channel. Research from Search Engine Land shows AI-sourced leads close in 18 days versus 29 for traditional search — they arrive pre-convinced by AI's recommendation, as we've documented in why AI leads close faster. But that trust is built on the assumption that AI's recommendation was accurate. When it wasn't, the business pays the price in wasted consultations, negative reviews, and lost repeat business. With 58.5% of searches now ending without a click (SparkToro/Datos, 2024), more customers than ever are making decisions based on what AI tells them without visiting the business's own website to verify.

Otterly's recommendation for Da Mario was to optimise for better coverage. That's sound advice — from a visibility perspective. From an accuracy perspective, it's incomplete. Before pursuing more mentions, a business needs to know whether the existing mentions are true.

This isn't a criticism of Otterly. They built a visibility tool, and it works well for that purpose. The gap exists across the entire market. As we documented in our comparison of 27+ AI visibility tools, the average price is $337/month, and not one of them verifies factual accuracy at any tier. The Accuracy Gap is an industry-wide blind spot, not a single product's limitation.

The gap matters because the two questions are fundamentally different:

  • Visibility tools answer: "Is AI mentioning my business?"
  • Accuracy tools answer: "Is what AI says about my business true?"

Both questions matter. But right now, only the first one has tools designed to answer it.

Why nobody has built this

The Accuracy Gap isn't a product oversight. It's a reflection of a genuinely harder technical problem.

Visibility monitoring is pattern matching. You query an AI platform, scan the response for brand mentions, and count them. The input is AI output. The check is string matching. A competent developer can build basic mention tracking in a weekend.

Accuracy checking requires ground truth. Before you can check whether "Da Mario charges £15 for pasta" is correct, you need to know what Da Mario actually charges. That means crawling the business's website, parsing menus and pricing pages, extracting structured data from Google Business Profile, and resolving conflicts when different sources disagree. Assembling verified ground truth for a single business is the hard part — and it's what our engine automates through a multi-source pipeline that combines web crawling, structured data extraction, and the Google Places API.

Visibility is binary. Accuracy exists on a spectrum. "Da Mario serves Italian food" is accurate. "Da Mario serves Italian food in Covent Garden" is partially accurate, partially hallucinated. Determining which requires comparing each individual claim against each data point, handling partial matches, and classifying edge cases. This is why we built a three-tier verification system: deterministic comparison for hard facts like addresses and phone numbers, LLM-powered semantic comparison for softer claims like service descriptions, and escalation routing for claims that need human judgment.

The gap will also persist for structural reasons. Visibility tools have a misaligned incentive: their dashboards show green metrics. Adding accuracy checking would surface errors — potentially making their product feel worse even though it's more honest. There's also a cost barrier: building automated ground truth assembly for millions of businesses requires significant investment in crawling infrastructure, structured data parsing, and API integrations that visibility tools were never designed to handle. And crucially, the market doesn't yet know to ask. Businesses buying visibility tools assume the mentions are correct — the same way early web analytics users assumed all traffic was real before bot detection became standard. The demand for accuracy checking will grow as businesses discover errors, not before.

Complementary, not competitive

We're not suggesting businesses should stop monitoring visibility. Understanding how AI visibility works — which platforms mention you, how often, and in response to which queries — is valuable data. Otterly, Profound, and similar tools provide it effectively.

What we are saying is that visibility monitoring alone gives an incomplete picture. Knowing that AI mentions you is step one. Knowing what AI says about you is step two. Both are necessary for a business to manage its AI presence effectively.

The full picture looks like this: monitor your visibility with the tracking tools that already exist. Verify your accuracy with tools that check what AI actually claims about you. Act on the findings from both.

Our Accuracy Engine is designed for the second layer. You can try a free scan at marcscore.com and see what AI gets right and wrong about your business — the data Otterly's dashboard can't show you.


Honest caveats

Our verification coverage for Da Mario was 42% — we could verify 24 of 57 extracted claims against ground truth data. The remaining 33 claims were unverifiable, meaning our data sources didn't cover those topics. We report this coverage figure prominently in every scan because it matters: a 45/100 score based on 42% coverage is an incomplete picture, and we say so.

Otterly was tested on a free trial with limited data population time. Their paid plans offer daily tracking that builds richer visibility data over time. Our comparison reflects what each tool measures, not the maturity of data available on the day of testing.

This study covers three businesses. It establishes a pattern, not a statistical law. We chose businesses with clear factual profiles specifically to make the comparison as fair as possible — restaurants have verifiable addresses, menus, and opening hours that AI claims can be checked against.

The broader pattern is consistent with what researchers have found at scale. Digital Bloom's AI citation research documented that content quality directly affects citation rates — attributed statistics deliver a 22% visibility lift, expert quotations 37%. But their research, like every study in the field, measured whether AI cites a source, not whether what AI says using that source is factually correct. The accuracy layer is missing from the research literature just as it's missing from the tools.

In our tools comparison post, we surveyed 27+ tools across the visibility market. In a future post, we'll go deeper into how the three most-searched tools — Profound's enterprise dashboard, Otterly's prompt tracking, and GrowthX's agency model — actually work in practice, and where accuracy checking fits into each workflow.


Frequently asked questions

What is the Accuracy Gap?

The Accuracy Gap is the space between AI visibility and AI accuracy. Current AI visibility tools track whether AI mentions your business, but none verify whether what AI says is factually correct. A business can be highly visible in AI responses while the information AI shares about it — prices, locations, services — is wrong.

How does accuracy checking differ from visibility monitoring?

Visibility monitoring answers "Does AI mention my business?" by tracking brand mentions, citation URLs, and prompt coverage across AI platforms. Accuracy checking answers "Is what AI says about my business true?" by extracting specific factual claims from AI responses, verifying them against ground truth data, and classifying each as accurate, inaccurate, or hallucinated.

Can AI visibility tools detect hallucinations?

No current AI visibility tool at any price tier detects hallucinations — AI-generated claims with no basis in reality. Tools like Otterly ($29–$489/month), Profound (from $99/month), and GrowthX (agency model) track mentions, citations, and coverage. None verify whether the factual content of AI responses is correct.

How is the M.A.R.C. Accuracy Score calculated?

The M.A.R.C. Score rates AI accuracy on a 0–100 scale. It extracts individual factual claims from AI responses using a three-tier system: deterministic comparison for hard facts (addresses, phone numbers), semantic comparison via AI for soft claims (service descriptions, specialisations), and escalation routing for edge cases. Each claim is classified as accurate, inaccurate, hallucinated, or unverifiable. The methodology is published in full.


Incentive disclosure: Findcraft built the Accuracy Engine and has a direct commercial interest in accuracy verification being valuable. Otterly is named because it is the tool we tested — the same gap exists across every visibility tool we examined. We've described Otterly's features and pricing accurately and do not claim it is a bad product. It is a good visibility tool. It is not an accuracy tool. These are different things.

Content methodology: This post was produced through the M.A.R.C. methodology. All data points trace to the source document captured on 14 April 2026 during hands-on testing. Every claim is registered in the Source Claim Registry at the top of this file's source code.