Re: [PATCH] mm: Fix invalid page pointer returned with FOLL_PIN gups

From: John Hubbard
Date: Thu Jan 27 2022 - 21:32:34 EST


On 1/27/22 17:36, Peter Xu wrote:
On Thu, Jan 27, 2022 at 11:25:38AM -0400, Jason Gunthorpe wrote:
On Thu, Jan 27, 2022 at 05:19:56PM +0800, Peter Xu wrote:

diff --git a/mm/gup.c b/mm/gup.c
index f0af462ac1e2..8ebc04058e97 100644
+++ b/mm/gup.c
@@ -440,7 +440,7 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
pte_t *pte, unsigned int flags)
{
/* No page to get reference */
- if (flags & FOLL_GET)
+ if (flags & (FOLL_GET | FOLL_PIN))
return -EFAULT;

Yes. This clearly fixes the problem that the patch describes, and also
clearly matches up with the Fixes tag. So that's correct.

It is a really confusing though, why not just always return -EEXIST
here?

Because in current code GUP handles -EEXIST and -EFAULT differently?

That has nothing to do with here. We shouldn't be deciding what the
top layer does way down here. Return the correct error code for what
was discovered at this layer the upper loop should make the decision
what it should do

We do early bail out on -EFAULT. -EEXIST was first introduced in 2015 from
Kirill for not failing some mlock() or mmap(MAP_POPULATE) on dax (1027e4436b6).
Then in 2017 it got used again with pud-sized thp (a00cc7d9dd93d) on dax too.
They seem to service the same goal and it seems to be designed that -EEXIST
shouldn't fail GUP immediately.

It must fail GUP immeidately if there is a pages list.

Right, but my point is we don't have an user at all for follow_page_mask()
returning -EEXIST with a **page which is non-NULL. Or did I miss it?


What you are missing is that other people are potentially writing code
that we haven't seen yet, and that code may use follow_page_mask(). The
idea, therefore, is to make it a good building block.



Callers that want an early failure must pass in NULL for pages, it is
just that simple. It has nothing to do with the FOLL flags.

A WARN_ON would be appropriate to compare the FOLL flags against the
pages. eg FOLL_GET without a pages is nonsense and should be
immediately aborted. On the other hand, we avoid this by construction
internal to gup.c

We have something like that already, although it's only a VM_BUG_ON() not a
BUG_ON() or WARN_ON() at the entry of __get_user_pages():

VM_BUG_ON(!!pages != !!(gup_flags & (FOLL_GET | FOLL_PIN)));


Here, however, I think we need to consider this a little more carefully,
and attempt to actually fix up this case. It is never going to be OK
here, to return a **pages array that has these little landmines of
potentially uninitialized pointers. And so continuing on *at all* seems
very wrong.

Indeed, it should just be like this:

@@ -1182,6 +1182,10 @@ static long __get_user_pages(struct mm_struct *mm,
* Proper page table entry exists, but no corresponding
* struct page.
*/
+ if (pages) {
+ page = ERR_PTR(-EFAULT);
+ goto out;
+ }
goto next_page;
} else if (IS_ERR(page)) {
ret = PTR_ERR(page);

IIUC not failing -EEXIST immediately seems to be what we want.

Which is what this does, for the only case it is acceptable - a null
page list.

From that POV, WARN_ON_ONCE() helps better on exposing an illegal return of
-EEXIST (as mentioned in the commit message) than the -EFAULT convertion, IMHO.

Again, that is upside down, -EEXIST should not be a illegal return. It
should be valid, have a defined meaning 'the vaddr exists but has no
struct page' and the top loop, and only the top loop, makes the
decision what to do about it.

I believe this works too and I think I get your point, but as stated above it's
just not used yet so the path is not useful to any real code path.

It's a really important point. This lower level routine needs to be fixed up
so that it behaves in a way that is both correct and reasonable. And it is not
reasonable to load up **pages with garbage pointers, regardless of whether any
callers are suffering from that yet.

Again, the argument that "no one uses this code path yet, so it's OK for it to
have pitfalls and weirdness" is not the way to go, at least, not if it's
feasible to provide a better alternative. And here, it is very feasible.


Especially with above VM_BUG_ON() it means if we'll go into the "if (pages)" we
should have already triggered the VM_BUG_ON() condition when entering the function.


And let's not forget that VM_BUG_ON() is a debug-only assertion. So you
may very well *not* have triggered anything.

Even if code is functionally correct, there are still API design
considerations, such as, "how usable is it?", and such. For example,
Rusty Russell's "How do I make this hard to misuse?" rules [1].

[1] https://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html



thanks,
--
John Hubbard
NVIDIA