Re: [RFC PATCH 00/16] 1GB THP support on x86_64

From: Jason Gunthorpe
Date: Thu Sep 03 2020 - 13:09:52 EST


On Thu, Sep 03, 2020 at 05:55:59PM +0100, Matthew Wilcox wrote:
> On Thu, Sep 03, 2020 at 01:40:32PM -0300, Jason Gunthorpe wrote:
> > However if the sizeof(*pXX) is 8 on a 32 bit platform then load
> > tearing is a problem. At lest the various pXX_*() test functions
> > operate on a single 32 bit word so don't tear, but to to convert the
> > *pXX to a lower level page table pointer a coherent, untorn, read is
> > required.
> >
> > So, looking again, I remember now, I could never quite figure out why
> > gup_pmd_range() was safe to do:
> >
> > pmd_t pmd = READ_ONCE(*pmdp);
> > [..]
> > } else if (!gup_pte_range(pmd, addr, next, flags, pages, nr))
> > [..]
> > ptem = ptep = pte_offset_map(&pmd, addr);
> >
> > As I don't see what prevents load tearing a 64 bit pmd.. Eg no
> > pmd_trans_unstable() or equivalent here.
>
> I don't think there are any 32-bit page tables which support a PUD-sized
> page. Pretty sure x86 doesn't until you get to 4- or 5- level page tables
> (which need you to be running in 64-bit mode). There's not much utility
> in having 1GB of your 3GB process address space taken up by a single page.

Make sense for PUD, but why is the above GUP code OK for PMD?
pmd_trans_unstable() exists specifically to close read tearing races,
so it looks like a real problem?

> I'm OK if there are some oddball architectures which support it, but
> Linux doesn't.

So, based on that observation, I think something approximately like
this is needed for the page walker for PUD: (this has been on my
backlog to return to these patches..)