Re: [PATCH 03/11] mm: page_vma_mapped_walk(): use pmd_read_atomic()

From: Will Deacon
Date: Tue Jun 15 2021 - 05:46:52 EST


On Fri, Jun 11, 2021 at 04:42:49PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 11, 2021 at 12:05:42PM -0700, Hugh Dickins wrote:
> > > diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
> > > index e896ebef8c24cb..0bf1fdec928e71 100644
> > > +++ b/arch/x86/include/asm/pgtable-3level.h
> > > @@ -75,7 +75,7 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte)
> > > static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
> > > {
> > > pmdval_t ret;
> > > - u32 *tmp = (u32 *)pmdp;
> > > + u32 *tmp = READ_ONCE((u32 *)pmdp);
> > >
> > > ret = (pmdval_t) (*tmp);
> > > if (ret) {
> > > @@ -84,7 +84,7 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
> > > * or we can end up with a partial pmd.
> > > */
> > > smp_rmb();
> > > - ret |= ((pmdval_t)*(tmp + 1)) << 32;
> > > + ret |= READ_ONCE((pmdval_t)*(tmp + 1)) << 32;
> > > }
> >
> > Maybe that. Or maybe now (since Will's changes) it can just do
> > one READ_ONCE() of the whole, then adjust its local copy.
>
> I think the smb_rmb() is critical here to ensure a PTE table pointer
> is coherent, READ_ONCE is not a substitute, unless I am miss
> understanding what Will's changes are???

Yes, I agree that the barrier is needed here for x86 PAE. I would really
have liked to enforce native-sized access in READ_ONCE(), but unfortunately
there is plenty of code out there which is resilient to a 64-bit access
being split into two separate 32-bit accesses and so I wasn't able to go
that far.

That being said, pmd_read_atomic() probably _should_ be using READ_ONCE()
because using it inconsistently can give rise to broken codegen, e.g. if
you do:

pmdval_t x, y, z;

x = *pmdp; // Invalid
y = READ_ONCE(*pmdp); // Valid
if (pmd_valid(y))
z = *pmdp; // Invalid again!

Then the compiler can allocate the same register for x and z, but will
issue an additional load for y. If a concurrent update takes place to the
pmd which transitions from Invalid -> Valid, then it will look as though
things went back in time, because z will be stale. We actually hit this
on arm64 in practice [1].

Will

[1] https://lore.kernel.org/lkml/20171003114244.430374928@xxxxxxxxxxxxxxxxxxx/