Re: LLM based rewrites
From: James Bottomley
Date: Mon Mar 09 2026 - 14:20:06 EST
On Mon, 2026-03-09 at 09:55 -0700, H. Peter Anvin wrote:
> On March 9, 2026 9:33:12 AM PDT, Jonathan Corbet <corbet@xxxxxxx>
> wrote:
> > Steven Rostedt <rostedt@xxxxxxxxxxx> writes:
> >
> > > On Mon, 09 Mar 2026 08:31:03 -0700
> > > "H. Peter Anvin" <hpa@xxxxxxxxx> wrote:
> > >
> > > > It is somewhat hard to see how that would constitute a "clean-
> > > > room"
> > > > rewrite. A clean-room rewrite entails two teams, one (the
> > > > "clean" room)
> > > > which must be certified to have never seen the code in
> > > > question, and all
> > > > communications between the two teams must be auditable.
> > >
> > > I was thinking the same.
> >
> > The argumentation that is being made (which I am trying to
> > reproduce but
> > am *not* advocating) is that "a clean-room rewrite is just one
> > means to
> > an end" and that, in this specific case, the code being rewritten
> > was
> > explicitly excluded from the context given to the bot (though that
> > turns
> > out not to entirely be the case). In theory, it only had the
> > desired
> > API and a set of tests available to it.
> >
> > The fact that every version of chardet was surely in its training
> > data
> > is not deemed to be relevant.
> >
> > jon
> >
>
> That's a question for the lawyers and the courts, really. But it is
> most definitely *not* clean room. That being said, clean room is
> certainly not the only way to rewrite software that can pass legal
> muster, but it is the gold standard
Agreed. The specific problem is that The US Copyright definition of
derivation presumes that if you've had exposure to the original work
then anything you produce that's similar is a derivative. That doesn't
mean that you can't produce a non derivative similar work it's just the
burden of proof shifts to you to prove that in creating the similar
work you didn't include any elements of the original. This is a
phenomenally difficult thing to prove in court (at least for humans)
which is why clean room reverse engineering was developed ... because
you can demonstrate the required non-exposure to the original by the
separation of the two teams.
I don't think LLMs will be able to come up with the necessary proof of
separation without essentially recreating the clean room process, which
grows cost prohibitive as the complexity of the work increases.
Regards,
James