---
"@context": "https://schema.org"
"@type": "TechArticle"
"@id": "https://dasarakushi.com/notes/the-second-web"
headline: "The second web — why every URL is becoming two URLs"
description: "The public web is bifurcating. Browsers want the polished page; AI agents want clean prose with stable identifiers. The discovery mechanism — rel=alternate — has been hiding in HTML since RSS shipped in 2002."
author:
  "@type": "Person"
  name: "Dasara Kushi"
  url: "https://dasarakushi.com"
datePublished: "2026-05-08"
dateModified: "2026-05-08"
keywords: "AI search, Generative Engine Optimization, GEO, rel=alternate, llms.txt, Markdown, JSON-LD, web architecture, content negotiation"
_format: "text/markdown"
_canonical: "https://dasarakushi.com/notes/the-second-web"
_publisher: "Dasara Kushi · dasarakushi.com"
---

# The second web

> Every URL is becoming two URLs. The discovery mechanism has been hiding in HTML since RSS shipped in 2002.

A user clicks a product link in their browser. The browser asks for `text/html`, gets HTML, paints pixels.

Twenty seconds later ChatGPT, asked the same question, fetches the same URL. It receives the same HTML, runs a simplifier, throws away nav, footer, scripts, modals, the recommendations carousel, and twenty trackers. What survives is a few paragraphs of prose and an image alt-tag — maybe a price.

Both clients asked for the same thing. Both got the same response. One was useful as designed. The other was useful only after destruction.

**Every URL is becoming two URLs.** The browser still wants the polished page. The AI agent wants clean prose, structured fields, machine-stable identifiers. We have been giving both consumers the same response and hoping the parser is smart enough. It is — but barely. Outdated price quotes, hallucinated stock status, mixed-up product variants, citations to obsolete URLs. These aren't model failures. They're parser failures. We made AI agents extract content designed for eyeballs.

## 1. What AI agents do with HTML

An LLM-powered fetch **strips** scripts, styles, iframes, tracking pixels, hidden inputs, nav menus, footers, cookie banners, modals, "you may also like" carousels, ad slots, share widgets, video players, comment threads, recommendation rails. It **keeps** heading text, paragraph text, list items, link anchors, image alt attributes, table cells, and occasional structured-data blocks.

Depending on the page, agent extraction strips 70–95% of the bytes you served. The remaining 5–30% is the only signal that reaches model context. When you wrote that page, were you writing for the 5–30% or the 95%?

## 2. HTML is the wrong shape for machines

- **Visual hierarchy ≠ semantic hierarchy.** A `<div class="text-3xl">` is visually a heading; to a parser it's a div. Atomic spacing classes and component-driven markup are illegible to extractors.
- **Client-side rendering hides content.** Most LLM fetchers don't run JavaScript. A 5MB React bundle's worth of "content" is invisible.
- **The page is mostly chrome.** Even on a server-rendered page, the content is a small island in a sea of nav, sidebars, rails, ads, footers, and tracking.

JSON-LD helps — the closest thing to a structured payload riding alongside HTML — but it's optional, often incomplete, and hard to keep in sync with what the page displays.

## 3. RSS solved this in 2002

When someone publishes a blog post, they put a magic line in the HTML head:

```html
<link rel="alternate" type="application/rss+xml" href="/feed.xml" />
```

Browsers ignore it. RSS readers follow it. Same URL, different shapes of the same content. The pattern is older than RSS — `hreflang` points Spanish browsers to a Spanish version; AMP pointed mobile crawlers to a stripped page; AppLinks redirected web URLs to native apps. Every time the consumer split, we used the same primitive: declare an alternate, let the consumer choose. We never needed a new spec. We needed a new payload.

For AI: declare a markdown alternate, let agents fetch it.

```html
<link rel="alternate" type="text/markdown" href="/llms.md" />
```

That's the whole spec. RSS readers respected it for twenty years. Why wouldn't AI agents?

## 4. llms.txt is a manifest, not the answer

`/llms.txt` solves site-level discovery: a manifest at the root telling AI consumers where the important content lives. Useful. Necessary. Not sufficient. `llms.txt` is a *directory*; the bifurcation pattern is *per-page content delivery*. Both layers belong:

- **Site-level (llms.txt):** here are my key pages, ranked, with summaries.
- **Per-page (rel=alternate):** here is *this* page in machine-readable form.

An agent answering a specific question doesn't want a site map. It wants the page. The page should announce its own machine-readable form, the same way every blog already announces its RSS feed.

## 5. What goes in the markdown variant

Three layers: (1) **YAML frontmatter** — JSON-LD in friendlier clothes: Schema.org type, stable `@id`, `dateModified`, structured fields; (2) **prose** — the actual content, cleaned, no chrome (markdown is the format LLMs read most natively); (3) **provenance metadata** — demo flags, publisher, last refresh, source-quality notes.

Two working demos on this site: [/mockups/product-page](https://dasarakushi.com/mockups/product-page) (Nordstrom-style URL; markdown is full Product schema with variants and each query parameter classified) and [/mockups/person-page](https://dasarakushi.com/mockups/person-page) (Spokeo-style profile as Person schema with an address graph and a `_demo: true` flag).

## 6. Publish entities, not pages

Don't bulk-convert every URL. Most pages aren't entities. Ask: *would an AI answer ever cite this page as a source?*

**Strong yes:** product pages, person profiles, location pages, organization pages, article/explainer pages, glossary definitions, recipe/how-to pages.

**Weak no:** homepage (brand, not citation), category and listing pages (navigation, not entities), search-result pages (ephemeral), tracking-bound landing pages, faceted variants (these canonicalize *to* the entity).

This mirrors how schema-markup decisions already get made: you don't put Product schema on the homepage. **Stop counting URLs. Start counting entities.** A site with 100,000 URLs and 8,000 distinct entities should publish 8,000 markdown alternates, not 100,000.

## 7. Where search lives, now and next

Today every website implements a mini search engine inside itself — category pages, faceted nav, internal search, recommendation rails. That infrastructure exists because the site does its own retrieval. In the AI-mediated future, the AI is the search engine; the site is a catalog of canonical entities plus a transaction surface. This is what grounding means: the model retrieves candidate sources, scores each one's evidence-fit, picks the strongest two or three, and cites them. **Stop building the search engine. Start publishing the catalog.**

## 8. Objections

- **"This is cloaking."** No. Cloaking serves different content to bots than humans for the *same* URL. Bifurcation serves the same content in two formats from clearly distinct URLs, each declaring the other.
- **"AI agents won't follow alternate links."** Some don't yet — so the markdown URL works standalone too: reachable directly, listed in llms.txt and sitemap.xml.
- **"This duplicates effort."** Only if you write twice. One source of truth renders both views.
- **"Search engines will see duplicate content."** Different MIME types; the markdown URL is noindex for search; rel=alternate is exactly the disambiguation Google asks for.
- **"What about the gap between formats?"** The markdown version is allowed to be smaller. Omit trust signals and CTAs; focus on the answerable claim.

## 9. SEO splits in two

**HTML SEO** stays where it's always been: site architecture, internal links, schema, Core Web Vitals, content clusters, conversion design. **AI-content SEO** is something else: Schema.org fluency in markdown, freshness signals, machine-stable identifiers, evidence-fitness, alternate-link discoverability. They overlap on canonicalization and freshness; they diverge on layout, design, and conversion plumbing. If you're an SEO who codes, you ship both.

---

Two formats. One source. One declaration in the head. The infrastructure has been here for twenty years. The pattern is real. The implementation is small. The only thing missing is the convention.

*By Dasara Kushi · May 8, 2026 · [https://dasarakushi.com/notes/the-second-web](https://dasarakushi.com/notes/the-second-web)*
