How CSS Can Scramble Your HTML (And Why AI Scrapers Can't Unscramble It)

Most web scrapers don't render pages. They fetch raw HTML and parse it — because rendering is expensive. That's an exploitable weakness.

I built obscrd, an open-source React SDK that rearranges your HTML so the DOM is scrambled, then uses CSS to put everything back in the correct visual order. The browser renders it perfectly. Scrapers reading the source get garbage.

Here's how it actually works under the hood.

The core trick: CSS `order` overrides DOM order

Flexbox has a property called order that controls the visual sequence of child elements. The DOM order doesn't change — but what the user sees does.

<div style="display:flex">
  <span style="order:2">world</span>
  <span style="order:0">Hello</span>
  <span style="order:1"> </span>
</div>

A human sees: Hello world

A scraper running element.textContent sees: world Hello

This is valid, standards-compliant CSS. Every browser supports it. Screen readers follow the visual order (per CSS spec), so accessibility is preserved. But any tool reading the raw DOM gets the shuffled version.

That's the entire foundation of obscrd's text obfuscation.

Two-level architecture: words and characters

obscrd uses a two-level shuffle. The outer container shuffles words using display:flex and order. Each word is then wrapped in its own display:inline-flex container that shuffles characters within the word.

Here's what the word "Hello" looks like at medium protection:

<span style="display:inline-flex">
  <span data-o="3" style="order:3">l</span>
  <span data-o="0" style="order:0">H</span>
  <span data-o="4" style="order:4">o</span>
  <span data-o="1" style="order:1">e</span>
  <span data-o="2" style="order:2">l</span>
</span>

The DOM order is l H o e l. The CSS order values reconstruct H e l l o visually. A scraper sees lHoel. A human sees Hello.

Scale this to a full paragraph and the raw HTML becomes completely unreadable.

Decoy injection: poisoning the scraper's output

Shuffling alone is reversible if you figure out the pattern. So obscrd also injects decoy characters — invisible spans containing random characters that are hidden from both visual rendering and screen readers:

<span aria-hidden="true"
      style="position:absolute;clip:rect(0,0,0,0);
             font-size:0;width:0;height:0;overflow:hidden">
  x
</span>

These decoys are scattered between the real character spans. A scraper parsing the DOM picks up both real and fake characters with no way to distinguish them. The hiding technique uses clip + absolute positioning rather than display:none — because smarter scrapers know to skip display:none elements.

At maximum protection level, obscrd also injects zero-width Unicode characters (U+200C and U+200D) between spans. These are invisible even in the raw text output, but they corrupt any text extraction that doesn't explicitly filter them.

Contact obfuscation: the RTL trick

Email addresses and phone numbers need a different approach because their visual format matters (you can't shuffle characters in an email and still have it be recognizable).

obscrd uses the CSS direction and unicode-bidi properties:

.obscrd-email-a1b2c3d4 {
  direction: rtl;
  unicode-bidi: bidi-override;
}

The HTML contains the email reversed: ved.drcsbo@olleh. The CSS flips it visually so the user reads hello@obscrd.dev. A scraper reading the DOM gets the reversed version — plus randomly injected decoy characters that look like plausible email characters (@, ., -).

This defeats every email harvesting bot that pattern-matches for mailto: or regex-extracts emails from HTML. The pattern simply doesn't exist in the source.

Deterministic seeding: same input, same output

Obfuscation needs to be deterministic for SSR and caching. If the shuffle is random, the server-rendered HTML won't match the client hydration, and React will throw errors.

obscrd uses a Mulberry32 PRNG (pseudorandom number generator) seeded from a project-specific key. The seed is derived using double FNV-1a hashing:

function fnv1a(str: string): number {
  let hash = 0x811c9dc5
  for (let i = 0; i < str.length; i++) {
    hash ^= str.charCodeAt(i)
    hash = Math.imul(hash, 0x01000193)
  }
  return hash >>> 0
}

function deriveSeed(masterSeed: string, contentId: string): string {
  const input = `${masterSeed}:${contentId}`
  const h1 = fnv1a(input)
  const h2 = fnv1a(`${input}:${h1.toString(16)}`)
  return h1.toString(16).padStart(8, '0')
       + h2.toString(16).padStart(8, '0')
}

Same seed + same text = identical obfuscation output, every time. Different content blocks get different shuffles. The double-hash produces 64 bits of entropy, pushing the collision threshold to ~4 billion unique blocks.

Three protection levels

obscrd supports three levels, giving developers control over the tradeoff between protection strength and DOM complexity:

Light — Word-level shuffle only. The characters within each word are in the correct order, but words are rearranged in the DOM. Minimal DOM overhead.

Medium (default) — Word shuffle + character shuffle + decoy character injection. Each word's characters are individually shuffled and interspersed with invisible decoy spans.

Maximum — Everything in medium, plus zero-width Unicode injection and user-select:none on the container. Even if someone selects the text, the clipboard gets nothing. (There's also a separate clipboard interceptor that replaces any copied text with a random shuffle.)

What about screen readers?

This is the question I get the most, and it's the constraint I spent the most time on.

CSS flexbox order is a visual reordering property. The CSS specification says that assistive technologies should follow the visual order, not the DOM order. This means screen readers like NVDA, JAWS, and VoiceOver read the text in the correct, human-readable order even though the DOM is shuffled.

For contact components (email, phone), obscrd uses a different approach: a visually-hidden <span> with sr-only styles contains the clean text for screen readers, while the visible obfuscated text has aria-hidden="true".

The result: WCAG 2.2 AA compliance maintained, scraper protection active.

The threat model (what it doesn't stop)

I want to be upfront about limitations, because this is critical context:

What obscrd stops:

HTML scrapers reading textContent or innerHTML (BeautifulSoup, Cheerio, etc.)
AI training crawlers (GPTBot, ClaudeBot, CCBot)
Email/phone harvesting bots
Casual copy-paste theft
Simple automation reading the DOM

What obscrd does NOT stop:

Headless browsers (Puppeteer, Playwright) — they execute CSS and read the rendered result, so they see exactly what humans see
Screenshot + OCR — nothing at the application layer can prevent this
Determined reverse engineers — CSS is readable, the pattern is discoverable

The realistic pitch is: obscrd raises the cost of scraping significantly. Most scrapers use simple HTTP-based tools because they're cheap and fast. We make that approach useless. The remaining 10-20% that use headless browsers need real compute investment — and for most targets, that cost exceeds the value of the data.

WASM-based text rendering (where text never exists in the DOM at all) is on the v2 roadmap. That closes the headless browser gap.

Try it yourself

Visit obscrd.dev and open DevTools. The text on the page looks normal. The DOM is scrambled. That's the pitch.

Or install it:

npm install @obscrd/react
npx obscrd init

import { ObscrdProvider, ProtectedText } from '@obscrd/react'

function App() {
  return (
    <ObscrdProvider seed={process.env.OBSCRD_SEED}>
      <ProtectedText>
        This text is readable by humans
        but scrambled for scrapers.
      </ProtectedText>
    </ObscrdProvider>
  )
}

MIT licensed, fully open source: github.com/obscrd/obscrd

The core trick: CSS order overrides DOM order