Re: LLM based rewrites
From: H. Peter Anvin
Date: Mon Mar 09 2026 - 13:09:31 EST
On March 9, 2026 9:55:36 AM PDT, "H. Peter Anvin" <hpa@xxxxxxxxx> wrote:
>On March 9, 2026 9:33:12 AM PDT, Jonathan Corbet <corbet@xxxxxxx> wrote:
>>Steven Rostedt <rostedt@xxxxxxxxxxx> writes:
>>
>>> On Mon, 09 Mar 2026 08:31:03 -0700
>>> "H. Peter Anvin" <hpa@xxxxxxxxx> wrote:
>>>
>>>> It is somewhat hard to see how that would constitute a "clean-room"
>>>> rewrite. A clean-room rewrite entails two teams, one (the "clean" room)
>>>> which must be certified to have never seen the code in question, and all
>>>> communications between the two teams must be auditable.
>>>
>>> I was thinking the same.
>>
>>The argumentation that is being made (which I am trying to reproduce but
>>am *not* advocating) is that "a clean-room rewrite is just one means to
>>an end" and that, in this specific case, the code being rewritten was
>>explicitly excluded from the context given to the bot (though that turns
>>out not to entirely be the case). In theory, it only had the desired
>>API and a set of tests available to it.
>>
>>The fact that every version of chardet was surely in its training data
>>is not deemed to be relevant.
>>
>>jon
>>
>
>That's a question for the lawyers and the courts, really. But it is most definitely *not* clean room. That being said, clean room is certainly not the only way to rewrite software that can pass legal muster, but it is the gold standard
In the end, though, it comes down to the plain fact that LLMs have pushed copyright law into undefined territory. As Uber showed, sometimes the strategy of doing something that is at the very best questionable legally can be successful if you can get spread broadly enough quickly enough so that the political process overtakes the legal one.