The Quiet Violence of GitHub's New Training Policy
Starting April 24, GitHub will train its AI on your code—unless you opt out. The real story isn't about defaults. It's about what we're teaching each other's competitors.
There's a line in GitHub Chief Product Officer Mario Rodriguez's announcement about the company's new data training policy that deserves more attention than it's getting. It's not the part about collecting your code snippets or the default opt-in. It's this: interaction data from Microsoft employees has already been used, and it has "seen increased suggestion acceptance rates across multiple languages as a result."
The sentence is bland, corporate. But what it describes is a machine learning from how people work—their architectural decisions, their naming conventions, the patterns they use to solve problems in their specific domains. And starting April 24, if you use Copilot Free, Pro, or Pro+, that machine will be learning from you too.
What's Actually Being Collected
The scope is broader than most developers probably realize. According to the announcement, GitHub will collect accepted or modified outputs, inputs and code snippets sent to Copilot, code context surrounding your cursor position, comments and documentation, file names, repository structure, navigation patterns, interactions with Copilot features, and your thumbs up or down feedback.
Private repository code is included. GitHub makes a distinction between code "at rest"—which it says it doesn't access—and code actively sent to Copilot during a session. If you're working with Copilot open in a private repository, that code falls within the scope of the new policy.
The data may be shared with what GitHub calls "affiliates"—companies in the same corporate family, primarily Microsoft and its subsidiaries. Third-party model providers, the company says, do not receive this data for their own training purposes.
The Enterprise Exception
Copilot Business and Enterprise users are excluded from this change entirely. Their data has never been used for training and won't be under the new policy. This creates a clear two-tier system: enterprises get ironclad guarantees, while individual developers and small teams get a toggle switch buried in settings.
GitHub's FAQ attempts to address some of the organizational gray area. Interaction data from users whose accounts are members of or outside collaborators with a paid organization will be excluded from model training. Data from paid organization repositories is never used, regardless of the user's subscription tier.
But this leaves open a troubling scenario: a developer using a personal Pro license for work on proprietary code. One Reddit commenter pointed out what should be obvious—individual users within an organization typically don't have the authority to license their employer's source code to third parties. Yet the opt-out is enforced at the user level, not the organization level.
What You're Really Trading
The community response has focused heavily on the opt-in-by-default framing, which several developers in the GitHub community discussion called a "dark pattern." One user, burnhamup, noted that the email with instructions to disable the setting doesn't actually link to the page where you update your settings. Another pointed out that the opt-out setting isn't available through GitHub's mobile app.
But a Reddit commenter named NeatRuin7406 framed the issue more fundamentally: "When you use copilot, you're not just getting suggestions, you're implicitly teaching the model what good code looks like in your domain. Your proprietary patterns, architecture decisions, domain-specific idioms, naming conventions, all get folded into a general model. That model then improves suggestions for everyone else, including your direct competitors who use the same tool."
This isn't a privacy violation in the traditional sense. It's something more subtle: the gradual transfer of competitive advantage from individual developers and small companies to a shared model that disproportionately benefits those with the scale to exploit it.
The GDPR Question
Several commenters have raised concerns about GDPR compliance. GitHub cites "legitimate interest" as its lawful basis for processing personally identifiable information. But as one Reddit user noted, this may not hold up under EU law when the rights and freedoms of data subjects could be considered overriding.
The 30-day notice period—announced March 26 for an April 24 effective date—gives users time to opt out, but it also reveals something about how GitHub views this decision. This isn't presented as something requiring active consent. It's framed as an improvement you should accept unless you have specific objections.
The Model Collapse Concern
One thread on Reddit with over 1,000 upvotes raised a technical concern that gets at something deeper: model collapse. As AI-generated code makes up a growing share of GitHub repositories, training future models on that code creates a recursive loop where models learn from the output of other models.
Research published in Nature has shown that AI models degrade when trained recursively on generated data, losing information about the true distribution over time. The first signs appear at the tails—the unusual cases, the edge conditions, the innovative solutions that don't fit established patterns.
GitHub is now proposing to train its models on interaction data that increasingly includes interactions with AI-generated code. The question isn't whether this will cause model collapse—it's how long before we notice.
What GitHub Acknowledges
To its credit, GitHub's FAQ notes that Microsoft, Anthropic, and JetBrains take similar approaches to using interaction data for model training. This is industry standard practice, which should tell us something about industry standards.
The company also states that users who previously opted out of GitHub's prompt and suggestion collection setting will have their preference carried over. You can opt out at any time through your Copilot settings under "Allow GitHub to use my data for AI model training."
The Real Choice
Here's what this policy actually forces: a decision about whether you believe the individual benefit of improved AI suggestions outweighs the collective cost of feeding your domain expertise into a model that will distribute it to everyone, including your competitors.
For Microsoft employees, that choice was made for them, presumably with the understanding that improving the model helps Microsoft's product. For enterprise customers, the choice is avoided entirely—they get the benefit without the cost.
For everyone else, there's a toggle switch and a deadline.
The announcement frames this as necessary for improvement. And it probably is—for GitHub. The company needs training data, and interaction data is more valuable than static code because it shows what humans actually accept and modify. But necessary for GitHub is not the same as necessary for developers.
We're being asked to teach the machine that will teach our competitors. And we have until April 24 to decide whether that's a trade we're willing to make.