The Opt-Out Illusion

Every AI coding tool promises you control over your data. We read the fine print.
By Nadia Byer · March 25, 2026

This morning, GitHub announced that starting April 24, Copilot will use interaction data from Free, Pro, and Pro+ users to train AI models.26 Opt-out, not opt-in. Thirty days' notice. A dropdown menu buried in account settings. GitHub's framing: the policy "aligns with established industry practices." They're right. It does. That's the problem.

Every AI coding tool has a privacy page. Every privacy page says some version of the same thing: your code is yours, your data is protected, you can opt out.

We read nine privacy policies, nine terms of service, and every piece of documentation we could find. The promise is control. The reality is a spectrum — from genuine transparency to what can only be described as privacy theater. The opt-out exists. What it opts you out of is the question nobody is asking loudly enough.

Here is what we found.

· · ·

The Default Problem

Five of the nine tools we examined train on your code by default.

Cursor's Privacy Mode is off by default for individual users. Your code, your prompts, your editor actions — all used to train models unless you find the toggle and flip it.1 Roughly half of Cursor's users have Privacy Mode enabled. That means roughly half don't.

Amazon Q Developer's free tier uses your code for training by default. Opting out requires configuring an AI services opt-out policy in AWS Organizations2 — enterprise infrastructure tooling, not a checkbox. If you're an individual developer on the free tier, the opt-out mechanism was designed for someone with an AWS admin.

Google Gemini Code Assist's individual tier doesn't just train on your code. Human reviewers read it. Google's privacy notice says it plainly: human beings "read, annotate, and process the data" to improve products and ML models.3 The notice warns users not to "submit confidential information or any data you wouldn't want a reviewer to see or Google to use." That warning is the privacy policy. Not a safeguard — a disclaimer.

Windsurf trains on your code by default.4 Claude Code's consumer tier defaults to training-on since Anthropic updated its consumer terms in August 2025, presenting users with an "Accept" button and a training toggle pre-set to "On," with a September 28 deadline to choose.5

The pattern: if you don't pay, you are the product. Every tool that offers a free tier uses free-tier code for training or improvement by default. Every paid or enterprise tier offers stronger protections. Your code's privacy is a function of your subscription price.

· · ·

The Windsurf Penalty

Every tool in this investigation has an opt-out mechanism. Only one punishes you for using it.

Windsurf's terms of service, Section 10.2: "if you opt out, you will not have access to Chat Services."4

Read that again. If you opt out of having your chat content used for training machine learning models, you lose the chat feature. The opt-out is not free. It costs you a core product function. This is not an opt-out. It is a coercion mechanism with a toggle attached.

Windsurf also retains data after deletion for "backups, archiving, prevention of fraud and abuse, analytics"4 — language broad enough to justify keeping anything indefinitely. And if you created your account with a work email, your data may be disclosed to your employer.

No other tool studied does this. Every other tool lets you opt out without losing functionality. Windsurf decided that opting out of training should cost you the product.

· · ·

The Google Problem

Google Gemini Code Assist has two tiers. The enterprise tier is strong: no training, stateless processing, ISO 27001/27017/27018/27701 compliance, SOC 1/2/3, IP indemnification.6 The individual tier is the most invasive of any tool we examined.

Individual tier: human reviewers read your code. Data retained for up to 18 months. Used for training.3

And the opt-out is broken.

Gemini CLI users discovered that the /privacy command — the mechanism for managing training data consent — creates a "circular redirect loop" with no actionable steps.7 The bug is filed as Issue #14104, tagged p1/security. Multiple users confirmed the problem. As of this writing, it remains open.

A privacy opt-out that doesn't work is worse than no opt-out at all. It creates the appearance of consent management while providing none. Users who attempted to opt out and encountered the broken flow may reasonably believe they succeeded. They didn't.

· · ·

The "Currently" Problem

GitHub Copilot's training policy contains a word doing more legal work than any engineer at Microsoft: "currently."

Model training on individual code is described as "currently disabled for everyone" and "cannot be enabled."8 Reassuring. But the FAQ adds: "In the future, developers may be able to opt in to model training under certain circumstances, but only if they choose to."

"Currently disabled" is not "we will never do this." It is "we are not doing this yet." The door is open. The lock is labeled "future." GitHub has reserved the right to enable opt-in training whenever it decides to — the policy language preserves the option explicitly.

Amazon Q Developer does the same thing. Its GitHub preview documentation states: "Amazon Q Developer for GitHub (Preview) does not currently use your content for service improvement. If we enable this in the future, we will provide you with adequate notice."2

"Currently" means "until we change our mind." "Adequate notice" is defined by the company providing it. These are not privacy commitments. They are privacy placeholders.

· · ·

The Consent-by-Inertia Play

Anthropic changed its consumer terms in August 2025. Previously, consumer chat data was not used for training. Under the new terms, users had to choose.5

The mechanism: a pop-up titled "Updates to Consumer Terms and Policies" with a prominent "Accept" button and a smaller toggle for training permissions. The toggle was defaulted to "On."

Most people never change defaults. That's not speculation — it's decades of behavioral research and the entire business model of every cookie consent banner on the internet. An "On" default is an opt-in that looks like an opt-out. The arxiv privacy scorecard paper called Anthropic "the sole exception implementing a user-centric opt-in model."9 They were grading the mechanism. They were not grading the default.

Claude Code's documentation is, to its credit, the most transparent of any tool studied. Env vars disable every telemetry channel: DISABLE_TELEMETRY, DISABLE_ERROR_REPORTING, DISABLE_FEEDBACK_COMMAND, CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC.10 The data flow diagram and per-provider defaults table are unusually clear. But transparency about a default-On training toggle is not the same as a default-Off training toggle. You can see exactly how your data flows. It still flows unless you stop it.

And then there's the /feedback trap. The /feedback command in Claude Code sends a full copy of conversation history — including all code discussed in the session — to Anthropic, retained for five years, regardless of your training preference.10 One command. Five years. The disclosure exists. The command is right there in the interface. It is, in practice, a data retention trapdoor labeled "feedback."

· · ·

The Enterprise Firewall

Every tool protects enterprise customers better than individual users. Every single one.

Tool Individual default Enterprise default
GitHub Copilot Engagement telemetry always on No code retained, ephemeral processing
Cursor Privacy Mode OFF, code trains models Privacy Mode enforced, CMEK available
Amazon Q Developer Code trains models No training, IP indemnity
Google Gemini Humans read your code, 18-month retention Stateless, no training, ISO certified
Windsurf Code trains models, opt-out kills chat Zero data retention, on-prem option
Claude Code Training toggle default On, 5-year retention No training, 30-day or ZDR
JetBrains AI Behavioral telemetry trains ML models Admin controls, code data opt-in only
Tabnine Claims no training (unverified) Same claims + on-prem/air-gapped
Sourcegraph Cody Content may improve services No training, self-hosted option

The gap is starkest at Google. An enterprise user gets stateless processing with ISO compliance. An individual user gets human reviewers reading their code for 18 months. Same company. Same product name. Different privacy universe.

This is the free tier tax. Individual developers — freelancers, students, open-source contributors, early-career engineers who can't expense a $40/month subscription — subsidize enterprise privacy with their code. The companies don't frame it that way. But that's what the policies say.

· · ·

The Surveillance Layer

Even if you opt out of everything, your employer might be watching how you use the tool.

GitHub's Copilot Dashboard provides managers with per-user, per-team, per-org metrics: daily and weekly active users, inline suggestion acceptance rates, lines of code added via Copilot, chat requests per active user, agent adoption rates, PR creation and merge counts, median time to merge.11 Twenty-eight-day trends. Exportable via API.

Microsoft Viva Insights integrates Copilot usage data with broader productivity metrics. The Copilot Dashboard — available to all Microsoft 365 subscribers — shows leaders and managers adoption rates by app, benchmarks by manager, region, and job function, "estimated financial savings," and behavior shifts.12 Microsoft's Agent Dashboard, in public preview since February 2026, adds agent-level tracking.

Viva Insights' personal insights feature includes a disclaimer: it is "not designed to enable employee evaluation, tracking, automated decision making, profiling, or monitoring."12 That disclaimer applies to the employee-facing personal insights surface. The Copilot Dashboard — which provides managers with per-user, per-team metrics — is a different surface of the same product.

The disclaimer describes one part of the product. The dashboard provides exactly the data needed to do everything the disclaimer disavows.

Under CCPA/CPRA, automated decision-making rules define "significant decisions" to include "employment or independent contracting opportunities or compensation."13 AI coding metrics used for performance evaluation would qualify. Colorado's SB 24-205, effective June 30, 2026 (postponed from February by SB 25B-004), will require safeguards against algorithmic discrimination in high-risk AI systems and grant the attorney general exclusive enforcement authority.14 But neither law specifically covers ongoing workplace surveillance via coding tool metrics. The regulatory frameworks exist. The enforcement doesn't.

· · ·

The Residual

Here is what every tool still collects after you've opted out of everything available to you.

Tool What persists after maximum opt-out
GitHub Copilot User Engagement Data: accept/dismiss rates, latency, features used, pseudonymous identifiers
Cursor Embedding metadata: file names, hashes, line ranges. Account and device telemetry
Amazon Q Developer "Client-side telemetry and usage metrics" — both tiers
Google Gemini (Individual) Unknown — opt-out documentation is incomplete
Google Gemini (Enterprise) User analytics, telemetry data, request/response events without content
JetBrains AI Anonymous telemetry, behavioral data (trains ML models, default for non-commercial)
Windsurf "Log and usage data" — cannot opt out without losing features
Claude Code Statsig metrics, Sentry errors, session surveys — all disableable via env vars
Tabnine Claims nothing. Unverified

"Opt out" does not mean "no data collected." It means "less data collected." The residual telemetry — engagement patterns, feature usage, acceptance rates, latency — is operational. It's also a behavioral profile of how you code.

GitHub's User Engagement Data "may include personal data, such as pseudonymous identifiers" and is used to "fine-tune ranking, sort algorithms, and craft prompts."8 You opted out of code retention. Your coding behavior still trains the product.

Claude Code is the only tool that lets you disable every non-essential data channel through environment variables.10 If you know they exist. If you set them. The average user won't. But the mechanism is there, which is more than anyone else offers.

· · ·

The Verification Gap

Here is the finding that should concern everyone: nobody has checked.

No independent third-party audit of any AI coding tool's privacy claims has been identified. Not one. SOC 2 Type II certifications exist for some tools, but SOC 2 audits verify that controls exist — not that they work as marketed.15 Tabnine claims "continuous monitoring and audits" but no public audit report has been found. The arxiv privacy scorecard paper evaluated policies and documentation — not technical behavior.9

No tool has been subjected to independent network traffic analysis confirming that opting out actually stops data transmission. No tool has been independently verified to delete data when it says it deletes data. No tool has proven that "ephemeral processing" is actually ephemeral.

A Cursor forum user put it plainly: "I need a piece of proper evidence to prove that when I turn privacy mode on, Cursor won't store any of my code."16 That evidence does not exist. For any tool.

A Consumer Reports / Wesleyan University study of 40 online retailers found that 30% appeared to ignore GPC opt-out requests entirely, while a separate Wesleyan analysis found only 45% of websites complied with opt-out signals.17 That's the baseline compliance rate for privacy opt-outs in general. There is no reason to believe AI coding tools perform better. There is no evidence either way. Nobody has looked.

· · ·

The Security Record

The privacy policies describe intended data flows. The security record describes what happens when those flows break.

CamoLeak, 2025: a CVSS 9.6 vulnerability in GitHub Copilot Chat. Researchers exploited it to silently search a victim's private codebase for AWS keys and exfiltrate them via image tags that bypassed Content Security Policy using GitHub's own infrastructure.18 GitHub patched by disabling image rendering in Copilot Chat entirely.

RoguePilot: attackers used passive prompt injections via GitHub issues to get Copilot to read internal files and exfiltrate GITHUB_TOKEN to remote servers.19

GitGuardian found that in roughly 20,000 repos where Copilot is active, over 1,200 — 6.4% — leaked at least one secret. That's 40% higher than the baseline across all public repos.20

Truffle Security found approximately 12,000 live secrets — API keys, tokens, passwords — in roughly 400 TB of Common Crawl data used for LLM training.21 The data that trains the models already contains the secrets the models are supposed to help protect.

And the arxiv scorecard's finding: four out of five coding assistants evaluated place the responsibility on the user to avoid inputting sensitive data like API keys.9 The tools don't filter secrets. They tell you not to paste them. That's the security model.

· · ·

The Regulatory Void

The legal frameworks exist. GDPR's right to erasure is practically incompatible with LLM training — once code enters a training set, the model cannot unlearn it. The EDPB's December 2024 opinion clarifies that AI models must be evaluated case-by-case for anonymization, noting the difficulty of ensuring individuals cannot be identified from trained models.22 Europe has issued over €6 billion in GDPR fines since 2018. California's CPRA regulations now expressly cover automated decision-making in generative AI systems.13 The EU AI Act's transparency obligations for general-purpose AI became binding in August 2025.23

The enforcement against AI coding tools specifically: zero. No regulatory action in any jurisdiction targeting AI coding tool training practices has been identified as of March 2026.

The companies know this. They are operating in the gap between laws that theoretically apply and enforcement that doesn't yet exist. "Currently disabled." "Adequate notice." "As long as necessary." This is policy language written for a regulator who hasn't arrived yet.

· · ·

What "Opt Out" Actually Means

After maximum opt-out on every tool — every toggle flipped, every env var set, every privacy mode enabled — here is what remains true:

Your code is still transmitted for processing. Every tool sends your code to remote servers for inference. Opting out of training does not opt you out of transmission. The code leaves your machine.

Behavioral telemetry persists. How you code — what you accept, what you reject, how long you pause, which features you use — is collected by nearly every tool regardless of privacy settings. This data trains product behavior even without your source code.

Retention policies are vague. "As long as necessary." "Backups and archiving." "Currently disabled." The timelines are either unspecified or contain escape clauses broad enough to be meaningless.

No one has verified compliance. The opt-out is a promise. The promise has not been independently tested. You are trusting the company that wants your data to honor your request that they not use it.

The enterprise tier is a different product. If you pay enterprise prices, you get real privacy infrastructure: zero data retention, encryption key management, on-prem deployment, stateless processing. If you don't, you get a toggle and a privacy policy written by the company's lawyers, not yours.

· · ·

The Honest Spectrum

Not every tool is equal. Some are trying. Some are not.

Tabnine makes the strongest claims: zero code retention, no training, no third-party sharing, air-gapped deployment available.24 If accurate, it's the best privacy posture in the industry. The claims are unverified by independent audit. Trust accordingly.

Claude Code has the most transparent documentation: every data flow mapped, every telemetry channel individually disableable, clear per-provider defaults.10 The default-On training toggle and the /feedback trap undercut the transparency. But the mechanism for full opt-out exists and is documented, which is more than most.

JetBrains offers the most granular data taxonomy: four distinct categories with different consent models. Community editions collect nothing.25 The "behavioral data" category trains ML models by default for non-commercial users — a distinction between "behavioral ML" and "code-generation ML" that may matter less than it appears.

GitHub Copilot doesn't train on your code. Currently.8

Google Gemini is two products wearing one name: enterprise-grade privacy for paying customers, and human reviewers reading your code on the free tier with a broken opt-out mechanism.37

Windsurf charges you a feature for opting out.4 That's not a privacy policy. That's a hostage negotiation.

Disclosure

This article was researched and written with the assistance of Claude, an AI made by Anthropic. Anthropic's Claude Code is one of the nine tools examined in this piece and is held to the same standard as every other tool. The irony of using an AI coding tool to investigate AI coding tool privacy is the point, not the problem. Corrections and reader perspectives welcome at nadia@sloppish.com.

Citations

  1. Cursor privacy policy and data use page. Privacy Mode OFF by default for individual users; ON by default for Business/Enterprise. ~50% of users have Privacy Mode enabled. Privacy · Data Use.
  2. Amazon Q Developer service improvement documentation. Free tier trains on code by default; opt-out requires AWS Organizations policy configuration. "Currently" language in GitHub preview. Service Improvement · Data Protection.
  3. Google Gemini Code Assist privacy notice (Individual). Human reviewers "read, annotate, and process the data." Retention up to 18 months. Link.
  4. Windsurf (Codeium/Exafunction) terms of service and privacy policy. Section 10.2: "if you opt out, you will not have access to Chat Services." ToS · Privacy Policy.
  5. Anthropic consumer terms update, August 28, 2025. Training toggle defaulted to "On." September 28 deadline for existing users. TechCrunch · Anthropic.
  6. Google Gemini Code Assist enterprise data governance. Stateless processing, ISO certifications, IP indemnification. Data Governance · Security/Privacy.
  7. Gemini CLI opt-out bug. Issue #14104, tagged p1/security. /privacy command creates circular redirect loop. GitHub Issue.
  8. GitHub Copilot privacy statement and data practices. Training "currently disabled for everyone." User Engagement Data collected regardless of settings. Privacy Statement · Analysis.
  9. "Can You Trust Your Copilot? A Privacy Scorecard for AI Coding Assistants." September 2025. Evaluated five tools across 14 criteria. arxiv.
  10. Claude Code data usage documentation. Env var controls, /feedback retention policy, per-provider defaults. Link.
  11. GitHub Copilot metrics and dashboard documentation. Per-user acceptance rates, lines of code, PR metrics, 28-day trends. Link.
  12. Microsoft Viva Insights and Copilot Dashboard. Adoption rates by manager/region/job function, "estimated financial savings," behavior shifts. Disclaimer: "not designed to enable employee evaluation, tracking, automated decision making, profiling, or monitoring." Link.
  13. CCPA/CPRA automated decision-making regulations (2025). CPPA rules covering employment decisions and generative AI systems. Link.
  14. Colorado SB 24-205. First comprehensive U.S. state AI consumer protection law. Effective February 1, 2026. AG enforcement authority for algorithmic discrimination in high-risk AI systems. Link.
  15. SOC 2 audit limitations. SOC 2 Type II verifies controls exist, not that marketed privacy claims are honored in practice.
  16. Cursor forum thread on Privacy Mode concerns. Users unable to verify privacy claims for NDA compliance. Link.
  17. Consumer Reports / Wesleyan University: study of 40 online retailers found 30% ignored GPC opt-out requests; separate Wesleyan analysis found only 45% of websites complied with opt-out signals. Consumer Reports.
  18. CamoLeak vulnerability (2025). CVSS 9.6. Exfiltrated private code via Copilot Chat image tags. Link.
  19. RoguePilot vulnerability. Prompt injection via GitHub issues exfiltrated GITHUB_TOKEN. Link.
  20. GitGuardian analysis. 6.4% of repos with active Copilot leaked at least one secret — 40% above baseline. Link.
  21. Truffle Security. ~12,000 live secrets found in ~400 TB of Common Crawl training data. Link.
  22. EDPB opinion on AI models and GDPR (December 2024). Addresses anonymization assessment for AI models, legitimate interest basis, and consequences of unlawful training data. Link.
  23. EU AI Act timeline. GPAI transparency obligations binding August 2, 2025. Link.
  24. Tabnine privacy claims. Zero code retention, no training, no third-party sharing. No independent audit identified. Privacy · Policy.
  25. JetBrains AI data collection policy. Four-category taxonomy: anonymous telemetry, behavioral data, web products, detailed code-related data (opt-in only). Policy · Retention.
  26. GitHub Blog, "Updates to GitHub Copilot interaction data usage policy," March 25, 2026. Starting April 24, interaction data from Free/Pro/Pro+ users used for AI model training. Opt-out via Settings > Privacy dropdown. Link.
Share: Bluesky · Email
Get sloppish in your inbox
Free newsletter. No spam. Unsubscribe anytime.