When Reddit created a single hidden test post that only Google's crawler could see, then watched its contents surface inside Perplexity's answer engine within hours, it didn't just catch a vendor red-handed. It built the evidentiary backbone for what may become the defining AI-scraping case of the decade.
On October 22, 2025, Reddit filed suit in the Southern District of New York against Perplexity AI and three data-scraping intermediaries — Oxylabs, SerpApi, and AWMProxy. The complaint reads less like a copyright dispute and more like a forensic teardown of how content actually moves from a closed platform into a generative AI product. And the legal theory it leans on could matter far more than the headline grievance.
What Reddit Actually Alleges
Reddit's complaint describes an "industrial-scale" laundering operation. The claim is not simply that Perplexity read public Reddit posts. It's that the named intermediaries allegedly masked their identities, rotated IP addresses, and bypassed access controls to harvest billions of Google search-engine results pages containing Reddit URLs, text, images, and video — then funneled that data to Perplexity's "answer engine."
The smoking gun is elegant. Reddit planted a test post engineered to be crawlable only by Google's search engine and accessible nowhere else online. Within a few hours, queries to Perplexity reportedly reproduced that post's contents. If that holds up, it's circumstantial evidence that's very hard to wave away.
The genius of Reddit's test isn't the technology — it's that it converts a murky "did you scrape us?" question into a binary one. Either the hidden content was obtained through the search-results pipeline, or it appeared by magic.
The Legal Theory That Matters
Most AI training-data lawsuits to date have been fought on copyright infringement grounds, where defendants raise fair-use defenses that remain genuinely unsettled in court. Reddit took a different road.
The lead claim invokes the Digital Millennium Copyright Act's anti-circumvention provision, 17 U.S.C. §1201, alongside unjust enrichment and unfair competition. The argument is that bypassing technical access controls is itself unlawful — independent of whether the underlying content is copyrightable or whether the use is "fair."
That reframing is strategically significant. It sidesteps the fair-use thicket entirely. If a platform erects technical barriers and a scraper defeats them through deception, the wrongdoing attaches to the circumvention, not the copying. This is the "terms of access" front that legal analysts have flagged as the next phase of AI data litigation.
Perplexity's Defense
Perplexity has pushed back, and its response deserves a fair hearing. The company says it summarizes content with citations and does not train its models on Reddit posts — a meaningful distinction, since a retrieval-and-cite system arguably operates differently from a model that ingests text into its weights.
Perplexity has also characterized the suit as an attempt to entrench Reddit's data-licensing business. Reddit signed lucrative content deals with Google and OpenAI; a cynic could read the litigation as protecting that revenue stream rather than user interests. There's force to that argument, and it shouldn't be dismissed.
But two facts complicate Perplexity's position. After Reddit sent a cease-and-desist letter, Perplexity's citations to Reddit reportedly increased roughly fortyfold — an awkward pattern for a company claiming restraint. And independent reporting has dogged Perplexity's crawling practices: Wired reported the company used undisclosed IPs and spoofed user-agent strings to bypass robots.txt, and Cloudflare publicly accused it of deploying "stealth, undeclared crawlers" that ignored no-crawl directives.
Why This Case Is Bigger Than Two Companies
The uncomfortable truth is that the entire AI search ecosystem runs on data whose provenance is contested. A ruling that circumvention-based access is independently actionable would ripple far beyond Perplexity.
Consider what's at stake for each side of the table:
| Stakeholder | What a Reddit win would mean |
|---|---|
| Platforms | A powerful new tool to wall off user content and force licensing |
| AI search engines | Legal exposure for how data is obtained, not just how it's used |
| Scraping vendors | Direct liability as named co-defendants, not invisible middlemen |
| Users | Their posts become a bargaining chip between corporations |
That last row is where the ethics get genuinely thorny. Reddit is framing itself as the guardian of its users' content. Yet those users wrote their posts for free, Reddit monetizes them through AI licensing deals, and now invokes their interests to defend that revenue. Whose data is it, really? The person who typed the comment has no seat at this table.
The Bottom Line
Reddit v. Perplexity is not really about whether scraping is rude. It's a deliberate attempt to relocate the legal battlefield from copyright — where AI companies have viable fair-use defenses — to access and circumvention, where they may not. If Reddit prevails, expect every major platform to harden its access controls and reach for §1201 the moment a crawler slips through.
The case won't resolve the deeper question it exposes: that an economy built on user-generated content has never given the users themselves a meaningful say in how their words get sold. But it will, at minimum, decide whether defeating a website's locks is a clever growth tactic or an illegal act. For an industry that has treated the open web as a free buffet, that's a verdict worth watching.


