I spent the last six weeks running a structured experiment: 200 landing pages, three AI engines (ChatGPT, Claude, Perplexity), the same set of buyer-intent queries for each page's category. The question I wanted to answer: which pages are invisible to AI search, and why?

The headline finding is uncomfortable. 71% of the pages were partially or fully invisible — meaning they did not appear in AI answers for queries where they should have been a natural fit. These weren't edge cases. They were pages from competently-run companies with good design, real product-market fit, and reasonable Google rankings. Many of them ranked top 3 organically for the same query that produced no AI citation at all.

The fixes for each invisibility cause are, mostly, embarrassingly simple. None of them require a redesign. None of them cost more than a few hours of engineering time. And yet here we are, two years into the AI search era, and a clear majority of well-run landing pages are still missing the basic configuration that gets them seen.

This piece walks through the methodology, the categorized failure modes, and the specific 12 fixes that account for the vast majority of recoverable invisibility. If you're running a landing page and wondering why your share of AI citation is lower than your share of organic traffic, the answers are below.

How I Tested 200 Pages

The methodology was deliberately simple, because I wanted findings that anyone could replicate. For each landing page, I generated five buyer-intent queries that someone in the page's target market would plausibly ask an AI engine. "Best [category] for [audience type]." "How does [Product] compare to [Competitor]?" "Cheapest [tool] under $X for [use case]." "Top alternatives to [popular product] for [audience]." "What's the best way to [specific job-to-be-done]?"

I ran each query through ChatGPT (GPT-5.5 with browsing), Claude Opus 4.7 (with web search), and Perplexity Pro. I logged whether the brand was cited, whether the description matched the brand's actual positioning, and whether competitors in the same set were cited instead. For each cited brand, I also logged the cited source — the brand's own homepage, a review site profile, a Reddit thread, a comparison page, etc.

"Visible" meant the brand was cited on at least 8 of 15 query-engine pairs (5 queries × 3 engines). "Invisible" meant fewer than 4 citations across all pairs. "Partially visible" was the middle band. The 200-page sample skewed B2B SaaS (about 65%) with the rest split across ecommerce, fintech, dev tools, and creator products.

For the 142 pages classified as invisible or partially invisible, I then ran a structured technical audit looking for the failure modes I'd hypothesized in advance. Here's what I found.

The 12 Failure Modes That Account for 91% of Invisibility

The pattern of failures was tighter than I expected. Most invisible pages were failing on three or more of the same twelve issues. A handful had completely unique problems, but they were the long tail. The first twelve failures, in order of frequency, accounted for roughly 91% of cases.

1. robots.txt is blocking AI crawlers (38% of invisible pages)

This is the simplest, most embarrassing, most common failure. Open yourdomain.com/robots.txt right now. Look for any line that says Disallow: / under User-agent: GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, or Amazonbot. If you find one, your page is being actively excluded from the corresponding AI engine.

The cause, in almost every case I audited, wasn't a deliberate decision. It was the default of a third-party tool — Cloudflare's bot management, a CDN's crawler-protection setting, a CMS plugin claiming to "protect against AI scraping." Someone clicked a checkbox eighteen months ago. Nobody on the marketing team has looked at robots.txt since.

The fix takes 90 seconds: remove the disallow rule for AI crawlers you want to be visible to. If you have a specific reason to opt out of AI training (proprietary data concerns, competitive moat), that's a legitimate stance — but you should be making it actively, not by default.

2. Hero content loads via client-side JavaScript with no SSR fallback (29%)

The second-most-common failure surprises engineers more than the first one. AI crawlers — especially OAI-SearchBot, ClaudeBot, and PerplexityBot — frequently don't execute JavaScript when fetching pages. They read the raw HTML payload your server sends, period. If your hero headline, value proposition, and CTA are inserted into the DOM by client-side React after page load, AI engines see an empty <body>.

The test takes 10 seconds. View source on your landing page (Cmd+Opt+U on Chrome). Search for your headline text in the raw HTML. If it's not there, you have an SSR problem. The headline exists for human visitors but doesn't exist for AI engines.

The fix depends on your stack. Next.js, Astro, Remix, and SvelteKit all server-render by default — most teams using these frameworks are fine. Pure client-side React (Create React App, Vite + React) is the most common offender. The migration to a SSR-capable framework is a real project, but it pays for itself in AI search visibility, traditional SEO, and Core Web Vitals all at once.

3. Missing or invalid structured data (27%)

Structured data — JSON-LD blocks describing your page's content as machine-readable data — is the single highest-leverage on-page signal for AI engines. Pages with valid structured data are cited disproportionately often relative to their authority signals. Pages without it are competing with one hand tied behind their back.

The two highest-impact schema types for landing pages are FAQPage (for any Q&A content) and Organization (for the brand entity). Add HowTo schema if your page describes a step-by-step process. Add Product or SoftwareApplication for tool/product pages.

The trap I saw constantly in the audit: schema present but invalid. JSON syntax errors, missing required fields, type mismatches between declared schema and actual content. Schema validators are unforgiving, and broken schema emits zero signals — sometimes worse than no schema at all if the AI engine treats invalid markup as a quality flag.

Run your page through Google's Rich Results Test after every schema change. If it doesn't validate for Google, it almost certainly won't work for AI engines either.

4. Hero copy is generic, feature-listed, or vague (24%)

This is the most subtle failure mode and the hardest to fix because it requires editorial judgment, not a config change. AI engines extract specific, factual, citable claims from pages. If your hero says "AI-powered platform for modern teams" — the AI has nothing to extract. There's no specific outcome, no named audience, no verifiable claim. The page is positioned but not described.

Compare against pages that consistently earn citations. "Cut your AWS bill 40% in one afternoon — for engineering teams stuck on legacy architecture." Specific outcome (40% in one afternoon). Named audience (engineering teams on legacy). Implicit credibility marker (this is something measurable, not aspirational). AI engines extract this readily because there's something to extract.

The rewrite isn't about being clever. It's about being specific. The pages that won citations in my audit consistently had specific outcomes, named audiences, and concrete numbers in the first 100 words. The invisible pages had abstractions. Read our copy mistakes piece for the broader pattern.

5. Zero brand mentions on third-party authoritative sites (22%)

The strongest predictor of AI citation rate I measured wasn't on-page at all. It was whether the brand had multiple mentions on the third-party sites AI engines weight heavily — G2, Capterra, ProductHunt, Reddit, niche industry blogs, "best of" listicles, comparison sites. Brands with 5+ such mentions were cited 3.2x more often than brands with 0–1 mentions, controlling for everything else.

This is the modern equivalent of "link building" but the rules are different. Quantity matters less; quality and topical relevance matter more. A single thoughtful Reddit thread in the right subreddit can move citation rate more than ten generic guest posts. A Wirecutter mention is worth more than a backlink farm. The signal AI engines are looking for is "is this brand discussed authoritatively in places real buyers go to research." Build that footprint.

Read our Reddit citation playbook for the highest-leverage off-site lever, and our comparison page playbook for capturing comparison queries.

6. Page lacks FAQ-formatted content (19%)

This deserves its own callout because the fix is so disproportionately high-leverage. Pages with FAQ-formatted content (Q&A blocks with proper structure) are cited 3.4x more often in AI Overviews than equivalent pages without. The format isn't about word count — it's about signal density. AI engines extract Q&A blocks readily because they map cleanly to the conversational queries users actually ask.

Add an FAQ section to your landing page. Five to eight questions, each answered in 2–4 sentences with specific information. Use the exact phrasing your buyers use ("How long does setup take?" not "What is the implementation timeline?"). Mark up the section with FAQPage JSON-LD. This single change moves citation rate measurably for most pages within 4–6 weeks.

7. No llms.txt file (15%)

llms.txt is an emerging convention — a plain-text file at your root domain (like robots.txt) that gives AI engines a structured summary of your site. As of April 2026, only about 11% of websites have one, but pages on domains with valid llms.txt are cited 2.1x more often on average than equivalent pages without.

The format is straightforward: a markdown-style file describing your product, key pages, target audience, and what you'd want an AI to know when recommending you. Place it at yourdomain.com/llms.txt. Read our complete llms.txt guide for the template and patterns we've seen work.

8. No clear answer in the first 100 words (12%)

AI engines weight the early-page content disproportionately when extracting answers. Pages that bury the answer behind setup paragraphs, brand-narrative openings, or extended introductions get extracted less reliably. The pattern that wins: lead with the direct answer in the first paragraph, then expand.

This is exactly the structure of the "quickAnswer" blocks we add to every roast.page blog post. The answer is in the first 100 words. The rest of the post adds depth. The pattern is unfashionable for editorial writing but optimal for AI extraction.

9. Mobile-only or mobile-first content with no desktop equivalent (8%)

Some progressive-web-app patterns hide content behind mobile-only views or progressive disclosure. AI crawlers default to desktop user-agent simulation. Content that's only visible after mobile-specific interactions (tap to expand, swipe to reveal) may not be extracted. The fix: ensure all content visible on mobile is also visible on desktop, even if styled differently.

10. Heavy use of dynamic content with no static fallback (7%)

Pages with personalization (different content per visitor segment, A/B test variants, geo-localized content) frequently fail AI extraction because the AI sees a default variant that may not match the page's positioning. Ensure your default static content is your strongest content, not a placeholder. AI engines see the default; visitors see the variant. Both should work.

11. Pricing hidden behind "Contact Sales" with no floor (6%)

AI engines preferentially cite pages with explicit pricing or starting-price floors when answering "best X under $Y" or "how much does Z cost" queries. Pages that say "Contact us for pricing" get omitted from these comparisons entirely. Even if your pricing is genuinely custom, publish a floor — "Plans from $X/mo" — to remain in the AI consideration set.

12. Sitemap not submitted to Google Search Console (3%)

The smallest of the twelve, but worth checking. AI engines don't crawl Google Search Console directly, but indexing health affects the source signals AI engines pull from. A page not in Google's index is also typically not in the data feeds AI engines sample. Verify your sitemap is submitted and indexed.

The 12-Step Recovery Plan

For each invisible page in my audit, the recovery plan was the same twelve-step list applied in priority order. The 38% of pages with robots.txt issues — fix that first because it's a five-minute change with massive impact. Then the 29% with SSR issues — that's a real project but it's the second-highest leverage. Then schema, then copy, then off-site presence.

The pages I tracked that fixed all twelve over a six-week period saw citation rates jump from a median of 14% (across the 15 query-engine pairs) to a median of 48% — a 3.4x lift. The pages that fixed only the first three saw a median lift to 31% — still a 2.2x improvement, with most of the gain landing in the first month.

You don't need to fix everything at once. The Pareto distribution here is steep: the first three fixes account for the majority of recoverable invisibility. Ship those, measure for four weeks, then decide if the next nine are worth doing.

What This Doesn't Solve

Worth being honest about the limits. Even pages that fix all twelve issues won't outrank competitors with much stronger off-site presence in mature, citation-saturated categories. A new dev tool with perfect on-page signals will still struggle to displace a Stack Overflow tag with a million threads. The on-page work is necessary but not sufficient — it gets you into the consideration set, but winning the citation requires building the off-site footprint that AI engines weight heavily.

The other limit: AI engines update their training data on irregular cycles, typically 3–9 months between major refreshes. Changes you make today appear in live-browsing immediately, but the deeper "trained-in" knowledge of your brand evolves over multiple training cycles. Be patient with the long-term signal; be impatient with the live-browsing signal.

If you want to skip the manual audit, our AI search visibility checker runs the on-page audit automatically, and our ChatGPT citation checker tests your actual citation rate across 10 buyer-intent queries. Both are free. Use them as a baseline, run the manual checks, ship the fixes, measure the lift.

The 71% invisibility rate I measured isn't going to last. The teams paying attention now — fixing robots.txt, adding schema, building third-party presence — are going to compound advantages over the teams who wait. The same way SEO compounded for early adopters in 2008, AEO compounds in 2026. Start now.

Why Your Landing Page Is Invisible to ChatGPT (And the 12 Fixes That Actually Work)