Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding

From: Gerald Schaefer
Date: Tue Sep 08 2020 - 14:02:48 EST

On Tue, 8 Sep 2020 07:30:50 -0700
Dave Hansen <dave.hansen@xxxxxxxxx> wrote:

> On 9/7/20 11:00 AM, Gerald Schaefer wrote:
> > Commit 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast
> > code") introduced a subtle but severe bug on s390 with gup_fast, due to
> > dynamic page table folding.
> Would it be fair to say that the "fake" page table entries s390
> allocates on the stack are what's causing the trouble here? That might
> be a nice thing to open up with here. "Dynamic page table folding"
> really means nothing to me.

We do not really allocate anything on the stack, it is the generic logic
from gup_fast that passes over pXd values (read once before), and using
pointers to such (stack) variables instead of real pXd pointers.
That, combined with the fact that we just return the passed in pointer in
pXd_offset() for folded levels.

That works similar on x86 IIUC, but with static folding, and thus also
proper pXd_addr_end() results because of statically (and correspondingly)
defined Pxd_INDEX/SHIFT. We always have static 5-level PxD_INDEX/SHIFT, and
that cannot really be made dynamic, so we just make pXd_addr_end()
dynamic instead, and that requires the pXd value to determine the correct
pagetable level.

Still makes my head spin when trying to explain, sorry. It is a very
special s390 oddity, or lets call it "feature", because I don't think any
other architecture has "dynamic pagetable folding" capability, depending
on process requirements, for whatever it is worth...

> > @@ -2521,7 +2521,7 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> > do {
> > pmd_t pmd = READ_ONCE(*pmdp);
> >
> > - next = pmd_addr_end(addr, end);
> > + next = pmd_addr_end_folded(pmd, addr, end);
> > if (!pmd_present(pmd))
> > return 0;
> It looks like you fix this up later, but this would be a problem if left
> this way. There's no documentation for whether I use
> pmd_addr_end_folded() or pmd_addr_end() when writing a page table walker.

Yes, that is very unfortunate. We did have some lengthy comment in
include/linux/pgtable.h where the pXd_addr_end(_folded) were defined.
But that was moved to arch/s390/include/asm/pgtable.h in this version,
probably because we already had the generalization in mind, where we
would not need such explanation in common header any more.

So, it might help better understand the issue that we have with
dynamic page table folding and READ_ONCE-style pagetable walkers
when looking at that comment.

Thanks for pointing out, that comment should definitely go into
include/linux/pgtable.h again. At least if we would still go for
that "s390 fix first, generalization second" approach, but it
seems we have other / better options now.