Machine Learning: Free Software Foundation Targets GitHub Copilot
The Free Software Foundation (FSF) has launched a call for white papers to be submitted on GitHub Copilot. The papers submitted are intended to analyze the effects of the machine learning assistant on the free software community, which is associated with numerous questions. The appeal blog post promises that the organization will read all submitted whitepapers and pay a reward for every $ 500 published.
At the same time, the article makes it clear that, from the point of view of the FSF, Copilot is “unacceptable and unfair”, since use with Microsoft products Visual Studio or Visual Studio Code requires software that, in their view, is not free software. At this point it should be mentioned that the source code editor Visual Studio Code is free and essentially open source, but far from free software in the sense of the FSF.
Copilot as a couple programmer
The essays should not deal with the tools, but with the open questions about the use of machine learning (ML) as a code aid. The copilot service launched in June helps you write code. GitHub calls it an “AI Pair Programmer”, which, like pair programming, makes suggestions for improving and supplementing the source code.
The technical basis is the Codex ML system developed by OpenAI, which converts natural language into source code. For example, Copilot tries to make a comment like
// Get average runtime of successful runs in seconds to create suitable source code. He also creates boilerplate code such as getters and setters, adds repetitive definitions and suggests suitable unit tests.
Copyright and legal aspects
The copilot obtains his “knowledge” from numerous openly accessible repositories without GitHub explicitly asking those responsible for it. this “Scratch code” quickly raised some allegations also on Twitter. The FSF blog post GitHub Copilot does not attack directly in this regard, but the questions asked let the intention shine through and therefore have a good rhetorical effect.
According to the FSF, the motivation for the Call for White Papers that has just started is a flood of inquiries about the foundation’s position on the open questions about copilot. Developers therefore want to know whether training an artificial neural network with their software can be called fair use. Anyone who is fundamentally interested in Copilot, on the other hand, is wondering whether elements such as code snippets copied from GitHub repositories could possibly lead to copyright infringements. In addition, the question arises among activists whether it is not fundamentally unfair to set up a commercial service on the basis of their work.
Copyright issues in particular keep cropping up in connection with machine learning applications. In order for such systems to be able to do something, they must first be trained. What the source code repositories are for Copilot are for language models such as the Generative Pre-trained Transformer 3 (GPT-3) from OpenAI texts. In the area of image generation, for example with the DALL-E, also developed by OpenAI and based on GPT-3, similar questions should arise.
GitHub is aware of the problem and addresses some points in the FAQ at the bottom of the copilot page. Accordingly, large parts of the ML community consider training based on publicly available data to be fair use. However, since the area is new territory, GitHub is interested in a discussion with developers on copyright and other topics in order to develop appropriate standards for training ML models.
A questionnaire as a template
The Free Software Foundation leaves the specific formulation of the answers to those who contribute white papers on the topic. You should answer the following questions, among others:
- Does training based on public repositories violate copyright law? Is It Fair Use?
- How likely is it that Copilot’s output will lead to actionable claims for violations of GPL-licensed works?
- How can developers ensure that copyrighted code is safe from being infringed by copilot?
- Does Copilot violate the AGPL (GNU Affero General Public License) while learning AGPL protected code?
- Is the trained AI / ML model protected by copyright and if so: who owns the copyrights?
The Call for White Papers runs until August 23rd and posts should be sent to the email address [email protected] Papers should be no longer than 3000 words and aimed at the free software movement whenever possible, but the organization is also considering texts for lawyers.
The Free Software Foundation intends to review the submissions by September 20 and send notifications as to whether it will accept the relevant papers for publication. Further details and the full questionnaire can be found on the FSF blog.
Disclaimer: This article is generated from the feed and is not edited by our team.