Ethics & AI 5 min read intermediate

Reddit v. Perplexity: The Scraping Lawsuit That Could Reshape AI

Reddit sued Perplexity and three scraping firms (Oxylabs, SerpApi, AWMProxy) in October 2025, alleging they bypassed access controls to harvest Reddit content from Google search results. Crucially, Reddit leans on the DMCA's anti-circumvention provision (17 U.S.C. 1201) rather than copyright, sidestepping fair-use defenses. A win could force platforms industry-wide to wall off user content and pursue licensing.

Aisha Patel

Jun 10, 2026

When Reddit created a single hidden test post that only Google's crawler could see, then watched its contents surface inside Perplexity's answer engine within hours, it didn't just catch a vendor red-handed. It built the evidentiary backbone for what may become the defining AI-scraping case of the decade.

On October 22, 2025, Reddit filed suit in the Southern District of New York against Perplexity AI and three data-scraping intermediaries — Oxylabs, SerpApi, and AWMProxy. The complaint reads less like a copyright dispute and more like a forensic teardown of how content actually moves from a closed platform into a generative AI product. And the legal theory it leans on could matter far more than the headline grievance.

What Reddit Actually Alleges

Reddit's complaint describes an "industrial-scale" laundering operation. The claim is not simply that Perplexity read public Reddit posts. It's that the named intermediaries allegedly masked their identities, rotated IP addresses, and bypassed access controls to harvest billions of Google search-engine results pages containing Reddit URLs, text, images, and video — then funneled that data to Perplexity's "answer engine."

The smoking gun is elegant. Reddit planted a test post engineered to be crawlable only by Google's search engine and accessible nowhere else online. Within a few hours, queries to Perplexity reportedly reproduced that post's contents. If that holds up, it's circumstantial evidence that's very hard to wave away.

The genius of Reddit's test isn't the technology — it's that it converts a murky "did you scrape us?" question into a binary one. Either the hidden content was obtained through the search-results pipeline, or it appeared by magic.

The Legal Theory That Matters

Most AI training-data lawsuits to date have been fought on copyright infringement grounds, where defendants raise fair-use defenses that remain genuinely unsettled in court. Reddit took a different road.

The lead claim invokes the Digital Millennium Copyright Act's anti-circumvention provision, 17 U.S.C. §1201, alongside unjust enrichment and unfair competition. The argument is that bypassing technical access controls is itself unlawful — independent of whether the underlying content is copyrightable or whether the use is "fair."

That reframing is strategically significant. It sidesteps the fair-use thicket entirely. If a platform erects technical barriers and a scraper defeats them through deception, the wrongdoing attaches to the circumvention, not the copying. This is the "terms of access" front that legal analysts have flagged as the next phase of AI data litigation.

Perplexity's Defense

Perplexity has pushed back, and its response deserves a fair hearing. The company says it summarizes content with citations and does not train its models on Reddit posts — a meaningful distinction, since a retrieval-and-cite system arguably operates differently from a model that ingests text into its weights.

Perplexity has also characterized the suit as an attempt to entrench Reddit's data-licensing business. Reddit signed lucrative content deals with Google and OpenAI; a cynic could read the litigation as protecting that revenue stream rather than user interests. There's force to that argument, and it shouldn't be dismissed.

But two facts complicate Perplexity's position. After Reddit sent a cease-and-desist letter, Perplexity's citations to Reddit reportedly increased roughly fortyfold — an awkward pattern for a company claiming restraint. And independent reporting has dogged Perplexity's crawling practices: Wired reported the company used undisclosed IPs and spoofed user-agent strings to bypass robots.txt, and Cloudflare publicly accused it of deploying "stealth, undeclared crawlers" that ignored no-crawl directives.

Why This Case Is Bigger Than Two Companies

The uncomfortable truth is that the entire AI search ecosystem runs on data whose provenance is contested. A ruling that circumvention-based access is independently actionable would ripple far beyond Perplexity.

Consider what's at stake for each side of the table:

Stakeholder	What a Reddit win would mean
Platforms	A powerful new tool to wall off user content and force licensing
AI search engines	Legal exposure for how data is obtained, not just how it's used
Scraping vendors	Direct liability as named co-defendants, not invisible middlemen
Users	Their posts become a bargaining chip between corporations

That last row is where the ethics get genuinely thorny. Reddit is framing itself as the guardian of its users' content. Yet those users wrote their posts for free, Reddit monetizes them through AI licensing deals, and now invokes their interests to defend that revenue. Whose data is it, really? The person who typed the comment has no seat at this table.

The Bottom Line

Reddit v. Perplexity is not really about whether scraping is rude. It's a deliberate attempt to relocate the legal battlefield from copyright — where AI companies have viable fair-use defenses — to access and circumvention, where they may not. If Reddit prevails, expect every major platform to harden its access controls and reach for §1201 the moment a crawler slips through.

The case won't resolve the deeper question it exposes: that an economy built on user-generated content has never given the users themselves a meaningful say in how their words get sold. But it will, at minimum, decide whether defeating a website's locks is a clever growth tactic or an illegal act. For an industry that has treated the open web as a free buffet, that's a verdict worth watching.

ai-ethics ai-regulation data-scraping perplexity copyright

More in Ethics & AI

Ethics & AI

TAKE IT DOWN Act: The Deepfake Law Now Binding Every Platform

The TAKE IT DOWN Act's Section 3 set a May 19, 2026 deadline for covered U.S. platforms to offer a removal process for non-consensual intimate images, including AI deepfakes, and to take them down within 48 hours. The FTC enforces it with civil penalties up to $53,088 per violation and has warned 15 major platforms.

By Aisha Patel · 6 min · Jul 16, 2026

Ethics & AI

Algorithmic Hiring: When AI Rejects You Before a Human Does

Algorithmic hiring tools that score and reject candidates face legal scrutiny. Mobley v. Workday, conditionally certified May 2025 as a nationwide ADEA collective, tests whether a software vendor can be an employer's 'agent' liable for discrimination. Disparate-impact law, the EEOC's $365K iTutorGroup settlement, and NYC Local Law 144 bias audits frame the accountability debate.

By Aisha Patel · 5 min · Jul 12, 2026

Ethics & AI

AI Hallucinations in Court: 1,725 Cases and a $110K Wake-Up Call

AI hallucinations in court filings have grown from the 2023 Mata v. Avianca case (a $5,000 sanction for six fabricated ChatGPT citations) into a documented worldwide phenomenon. Damien Charlotin's database catalogs 1,725 cases as of July 5, 2026, led by the US (1,187), Canada (190), and Australia (96). Self-represented litigants account for 1,016 cases, lawyers 667. In December 2025, an Oregon federal judge imposed a record $110,000 penalty in Couvrette v. Wisnovsky for 15 fake cases and 8 fabricated quotations. At least 25 federal courts now require AI-use certifications.

By Aisha Patel · 5 min · Jul 7, 2026