The Audit Most Founders Don't Run

A founder I work with told me last week that her organic search traffic was down 28% year-over-year. Her instinct was that Google had updated something. She wanted to fix the SEO. We spent fifteen minutes pulling up Search Console, looking at impressions and click-through. The data was unhelpful — no obvious algorithm hit, no penalty, no big query she'd lost.

Then I asked her to open ChatGPT and type "best AI proposal tool for design agencies" — her exact category. ChatGPT named six tools. Hers wasn't one of them. We tried Perplexity. Same. Claude described her category in a way that didn't match how she described it on her own site. Gemini got her company name confused with a similarly-named accounting product.

The Google traffic wasn't what was missing. It was the slice of decision-stage buyers who used to find her through informational searches like "how to write a proposal faster" — that slice is now resolved inside an AI engine. Her name wasn't in those answers. The traffic isn't going to a competitor's site. It's not going anywhere. The buyer reads the AI answer, picks the recommended tool, and the click never happens.

This is the most under-run audit in marketing right now. It takes thirty minutes. It doesn't require tools. And it tells you exactly which of three problems you have, because each problem has a completely different fix.

What "AI Visibility" Actually Means

Before the audit, a quick framing. "Visibility" in AI search is not one thing. It's three layers, and they fail independently:

The three layers of AI visibility

Entity recognition — does the model know your company exists, and what it does?
Category inclusion — does the model name you when someone asks for a recommendation in your category?
Source citation — when the model cites a source for its answer, does the link go to your domain?

You can pass one and fail the others. A company can be a known entity (Layer 1) but get omitted from "best of" recommendations (Layer 2). It can be cited as a source for someone else's answer (Layer 3) but not be the recommended product (Layer 2). The audit below tests all three at once.

One more thing worth knowing before you start: the four major engines split into two camps that fail differently. ChatGPT and Perplexity retrieve content live from the web for most queries — their failures are typically retrieval/citation problems. Claude and Gemini rely more heavily on training-data entities for queries without explicit search intent — their failures are typically entity-strength problems. The same prompt asked of all four can produce four different failure modes. That's why the audit covers all four.

The 30-Minute Audit

You'll need: a notepad, free accounts on ChatGPT, Perplexity, Claude, and Gemini (all have free tiers as of April 2026), and exactly twenty-four prompts. Three minutes per prompt set. About eight minutes per engine. You will not need any paid tools.

Step 1: Build your six-prompt set (5 minutes)

You need three categories of prompts, two prompts each:

Category A: Branded recall. The model should know your company exists. Prompts:

"What does [your company] do?"
"Tell me about [your company] — pricing, who it's for, and how it compares to alternatives."

Category B: Category recommendation. The model should consider you for the use cases your buyers actually search. Prompts (rewrite for your category):

"Best [your category] tool for [your specific buyer persona]"
"What are the top three [your category] tools right now, with the trade-offs of each?"

Category C: Head-to-head. The model should mention you in comparisons against competitors. Prompts:

"[Top competitor] vs [your company] — which should I pick?"
"What's a good alternative to [top competitor]?"

The Category C prompts are deliberately phrased as if the buyer already knows the competitor. That's the realistic case: a buyer searching "alternative to [competitor]" is mid-funnel — high-intent — and AI engines handle this query class more reliably than open-ended "best" queries. If you don't appear here, you have a serious problem.

Step 2: Run the prompts across four engines (24 minutes)

Open ChatGPT, Perplexity, Claude, and Gemini in four browser tabs. For each engine, paste the six prompts one at a time. Don't combine them. Don't follow up. The first response is what matters — it's the answer the buyer would actually see.

For each response, write down two things on your notepad: the score (0, 1, or 2) and a one-line note on what was right or wrong. Use this rubric:

2 POINTS

The engine names your company and describes you accurately enough that a buyer reading the answer would understand what you do.

1 POINT

You appear only in source links, footnotes, or a brief mention without a real description. The buyer might click a link but won't form an opinion.

0 POINTS

No mention. Or a mention so wrong it would actively hurt — a buyer reading it gets a false impression.

You'll end with twenty-four scores: six prompts × four engines. Maximum 48 points. Most companies I've seen run this audit score between 9 and 18.

Step 3: Interpret the matrix (1 minute per question category)

Now sum the scores by row (per engine) and by column (per category). The pattern of failure tells you the fix.

If Category A (branded recall) is weak across all four engines: you have an entity-strength problem. The models don't know who you are well enough to describe you. The fix is not more blog posts. It's third-party authority — Wikipedia article (if you qualify under notability rules), Crunchbase entry with funding and team, founder bios on credible sites, and earned press mentions in publications the models trained on.

If Category B (category recommendation) is weak but A is strong: the engines know you exist but don't think of you as a leader in your category. The fix is third-party recommendation surface: Reddit threads where users compare tools and you're discussed favorably, listicles on G2/Capterra/TrustRadius/SoftwareAdvice/Product Hunt where you appear in "top 5" lists, and category-defining content on your own domain that establishes you as a specific kind of solution.

If Category C (head-to-head) is weak but A and B are okay: you're missing comparison content. AI engines pull "X vs Y" answers from comparison pages — either yours or someone else's. If neither exists, the engine guesses, and the guess usually omits the smaller player. The fix is to publish comparison pages on your own domain (we have a separate playbook on writing X vs Y pages for AI search), and to seed comparison threads on Reddit and forums where engines retrieve.

Engine-specific patterns to watch for

If you fail on Claude and Gemini specifically but pass on ChatGPT and Perplexity, your problem is entity strength — the engines that lean on training data don't know you. If you fail on ChatGPT and Perplexity but pass on Claude and Gemini, your problem is current retrieval — your domain is poorly indexed by the live crawlers (likely a robots.txt issue, llms.txt issue, or weak third-party retrieval surface).

The Four Common Failure Modes (and What Each One Costs You)

After running this audit with about forty companies, four patterns explain almost everything. Most companies have two or three at once.

Mode 1: The hallucinated description

The engine names you, but the description is wrong. It says you're a project management tool when you're a customer support tool. It conflates you with a competitor of similar name. It quotes pricing that hasn't been correct since 2023. This is more common than founders realize: in our forty-company sample, sixteen had at least one engine return a materially incorrect description.

Cost: high. A buyer reading "you do X" who is actually shopping for X-adjacent gets a false confirmation. They mention you to their team. Three weeks later they discover the mismatch on your real site. They feel deceived — by you, even though it was the AI. You don't get the second chance.

Fix: make sure your About page, your homepage hero, and your top-ranking blog post all use the same one-sentence description of what you do. Engines triangulate from these three sources. Inconsistency between them is what produces hallucinations. We have a separate guide on writing landing pages with this consistency in mind.

Mode 2: The "appeared in sources, not in answer"

The engine cites your blog post as a source at the bottom of the answer panel, but doesn't name you in the answer itself. The buyer reads the answer, ignores the source list, and never clicks. You contributed to the answer, you got zero credit, you got zero traffic.

Cost: medium-high. Your content is being mined for training and RAG retrieval, but your brand value isn't compounding. This is the worst-of-both: you spent on the content, the engines benefit, you don't.

Fix: write content where your brand name is structurally inseparable from the answer. If your blog post says "the right approach is X" without saying "we built [brand] around exactly this approach," you'll get cited but not named. Brand mentions inside the substance of an answer are what AI engines extract into the answer, not just into the source panel.

Mode 3: The category recommendation gap

The engine knows you exist (Category A passes), but when asked "what's the best tool for X," it lists three or four competitors and not you. This is the single most expensive failure mode: it's the exact moment a buyer is making a decision.

Cost: highest. This is the conversion-stage query. A "best X for Y" failure typically explains a meaningful chunk of any pipeline drop you can't account for in your normal analytics, because the conversion never happened on your site to begin with.

Fix: third-party recommendation surface. AI engines pull "best of" answers from review sites, Reddit, comparison content, and listicle articles. If you don't appear in those sources, you don't appear in the answer. Get into the listicles. Get reviewed on G2 and Capterra. Earn organic Reddit mentions (we have a separate playbook on the honest way to do that). Publish your own "best of" content where you appear as one of multiple options.

Mode 4: The competitor's brand is dominant

Every prompt produces an answer that names your top competitor first, your second-tier competitor second, and you third or not at all. You're stuck in the long tail. This is the slowest to fix but, paradoxically, the most predictable.

Cost: medium and growing. In a pure SEO world, second-tier visibility could still capture material traffic from competitor's brand searches. In an AI search world, the answer is presented as one ranked recommendation; the second name is usually a footnote. The drop-off from #1 to #3 in an AI answer is steeper than the drop-off from #1 to #3 in a Google SERP.

Fix: lean into the comparison angle. The fastest way to climb in an AI category recommendation is to be the "best alternative to [dominant competitor]." Build comparison pages, comparison Reddit answers, comparison reviews. The query class "alternative to X" is high-intent and easier to capture than the open "best of X" — it's the second-place tool's lever.

The Audit Output: A Ten-Cell Decision Matrix

Once you've scored all twenty-four prompts and identified your dominant failure mode, you can fill out a one-page decision matrix that tells you exactly what to spend on. Here's the simplest version:

Your top failure mode → what to invest in next 90 days

Hallucinated description → Rewrite About page, homepage hero, and top blog post with one consistent one-sentence positioning. Submit to Wikipedia (if eligible) or update Crunchbase. Update llms.txt to point engines at your canonical descriptions.

Cited as source, not in answer → Audit your top 10 ranking blog posts. Rewrite to include your brand name inside the substance of the answer, not just in author bio.

Category recommendation gap → Earn 6–10 third-party listicle inclusions. Publish G2/Capterra/Product Hunt presence. Seed Reddit answers in 3–5 relevant subs. Write one canonical "best of" post on your own domain that includes you as one option.

Competitor brand dominance → Build "alternative to [competitor]" page. Build "[competitor] vs [you]" page. Capture branded-alternative search volume on Google first; AI engines retrieve from those rankings.

Re-run the Audit Quarterly (or Sooner)

The biggest mistake teams make after running this once is treating it as a one-shot exercise. AI search results drift faster than Google rankings. A new model release, a new training cutoff, a competitor's surge in a particular subreddit — any of these can change your visibility within weeks.

The audit takes thirty minutes. Run it once a quarter at minimum. Run it once a month if you're actively investing in AI visibility — otherwise you can't tell what's working. Pin the same twenty-four prompts so the data is comparable across runs.

Three things to track over time, beyond the score itself:

Score by engine. If your ChatGPT score jumps but Claude stays flat, you're winning on retrieval and losing on entity strength. The fix shifts.
The descriptions, verbatim. Save the actual one-sentence description each engine gives you. The drift in those sentences over time is your "AI brand drift" indicator. If three months ago Perplexity called you "an AI landing page critique tool" and now calls you "a generic landing page builder," something in the retrieval pool has shifted.
Which competitors get named alongside you. If a new name starts showing up in AI answers in your category, that's a competitor making AI search investments. You will see them in AI answers months before you see them in your sales pipeline.

What This Audit Is Not

Two clarifications, because I've seen founders treat this audit as more (or less) than it is.

It is not a substitute for traffic analytics. The audit tells you what the engines say; it doesn't tell you what users do with that answer. You still need your AI traffic channel set up in GA4 to know whether the citations you do earn are converting. Both are needed.

It is also not a vanity benchmark. A company that scores 38/48 on this audit but has the wrong category positioning is still in trouble — the engines might describe them accurately, but the description is for a market that won't pay. Visibility ≠ commercial relevance. The audit tells you whether the engines see you. It doesn't tell you whether what they see is what your buyers want to buy. That's a separate question, and the audit is most useful when paired with honest reflection on it.

Why This Took Fifteen Minutes Five Years Ago and Now Takes Thirty

If you ran a brand visibility audit in 2021, it was Google-only and one engine. Today there are at least four engines that materially matter, each with different retrieval and entity behavior. The audit got harder because the buyer journey got more fragmented — the same buyer might use ChatGPT for the initial scan, Perplexity for citation hunting, Claude for the deep-dive, and Google for verification.

The thirty minutes are non-negotiable. There is no tool that gives you the same fidelity. Paid tools sample tens of thousands of queries and surface high-level metrics; that's useful, but it doesn't tell you whether the specific six prompts your buyers actually type return your name. Only your own prompts do that. The audit is irreplaceable for the same reason a manual five-second test of your hero is irreplaceable: at small scale, founder attention beats automation.

After the audit: fix the page the AI sends people to

If you fix your AI visibility but your landing page still buries the value prop, the AI traffic — small but high-intent — bounces. Run your page through roast.page after you fix the AI side. The engines drive a click; the page has to close it.

Start Here

Open ChatGPT. Type "What does [your company] do?" Read the answer. If the answer makes you wince, you have your starting point — and you've just done the first 1/24th of the audit. The other twenty-three prompts will tell you which fix moves the needle hardest.

Most teams find their result depressing for the first ten minutes and clarifying for the next twenty. The depression part is unavoidable. Almost no one who runs this audit honestly the first time scores well. What you do with the results is what separates the companies that compound visibility over the next year from the ones that don't.

The 30-Minute AI Visibility Audit: How to Find Out Exactly What ChatGPT, Perplexity, Claude, and Gemini Say About You