Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

From: Ira Weiny
Date: Tue Dec 08 2020 - 21:23:32 EST


On Tue, Dec 08, 2020 at 03:40:52PM -0800, Dan Williams wrote:
> On Tue, Dec 8, 2020 at 2:49 PM Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
> [..]
> > > So what's your preferred poison?
> > >
> > > 1. Corrupt random data in whatever's been mapped into the next page (which
> > > is what the helpers currently do)
> >
> > Please no.
>
> My assertion is that the kernel can't know it's corruption, it can
> only know that the driver is abusing the API. So over-copy and WARN
> seems better than violently regress by crashing what might have been
> working silently before.

Right now we have a mixed bag. zero_user() [and it's variants, circa 2008]
does a BUG_ON.[0] While the other ones do nothing; clear_highpage(),
clear_user_highpage(), copy_user_highpage(), and copy_highpage().

While continuing to audit the code I don't see any users who would violating
the API with a simple conversion of the code. The calls which I have worked on
[which is many at this point] all have checks in place which are well aware of
page boundaries.

Therefore, I tend to agree with Dan that if anything is to be done it should be
a WARN_ON() which is only going to throw an error that something has probably
been wrong all along and should be fixed but continue running as before.

BUG_ON() is a very big hammer. And I don't think that Linus is going to
appreciate a BUG_ON here.[1] Callers of this API should be well aware that
they are operating on a page and that specifying parameters beyond the bounds
of a page are going to have bad consequences...

Furthermore, I'm still leery of adding the WARN_ON's because Greg KH says many
people will be converting them to BUG_ON's via panic-on-warn anyway. But at
least that is their choice.

FWIW I think this is a 'bad BUG_ON' use because we are "checking something that
we know we might be getting wrong".[1] And because, "BUG() is only good for
something that never happens and that we really have no other option for".[2]

IMO, These calls are like memcpy/memmove. memcpy/memmove don't validate bounds
and developers have lived with those constructs for a long time.

Ira

[0] BTW, After writing this email, with various URL research, I think this
BUG_ON() is also probably wrong...

[1]
<quote>
...
It's [BUG_ON] not a "let's check that
everybody did things right", it's a "this is a major design rule in
this core code".
...
</quote>
-- Linus (https://lkml.org/lkml/2016/10/4/337)

[2] https://yarchive.net/comp/linux/BUG.html