Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared

From: Jia He
Date: Fri Oct 18 2019 - 22:59:57 EST


Hi Palmer

On 2019/10/19 4:38, Palmer Dabbelt wrote:
On Wed, 16 Oct 2019 16:46:08 PDT (-0700), will@xxxxxxxxxx wrote:
Hey Palmer,

On Wed, Oct 16, 2019 at 04:21:59PM -0700, Palmer Dabbelt wrote:
On Tue, 08 Oct 2019 05:39:44 PDT (-0700), will@xxxxxxxxxx wrote:
> On Tue, Oct 08, 2019 at 02:19:05AM +0000, Justin He (Arm Technology China) wrote:
> > > On Mon, Sep 30, 2019 at 09:57:40AM +0800, Jia He wrote:
> > > > diff --git a/mm/memory.c b/mm/memory.c
> > > > index b1ca51a079f2..1f56b0118ef5 100644
> > > > --- a/mm/memory.c
> > > > +++ b/mm/memory.c
> > > > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > > >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2;
> > > >Â #endif
> > > >
> > > > +#ifndef arch_faults_on_old_pte
> > > > +static inline bool arch_faults_on_old_pte(void)
> > > > +{
> > > > +ÂÂÂ return false;
> > > > +}
> > > > +#endif
> > >
> > > Kirill has acked this, so I'm happy to take the patch as-is, however isn't
> > > it the case that /most/ architectures will want to return true for
> > > arch_faults_on_old_pte()? In which case, wouldn't it make more sense for
> > > that to be the default, and have x86 and arm64 provide an override? For
> > > example, aren't most architectures still going to hit the double fault
> > > scenario even with your patch applied?
> >
> > No, after applying my patch series, only those architectures which don't provide
> > setting access flag by hardware AND don't implement their arch_faults_on_old_pte
> > will hit the double page fault.
> >
> > The meaning of true for arch_faults_on_old_pte() is "this arch doesn't have the hardware
> > setting access flag way, it might cause page fault on an old pte"
> > I don't want to change other architectures' default behavior here. So by default,
> > arch_faults_on_old_pte() is false.
>
> ...and my complaint is that this is the majority of supported architectures,
> so you're fixing something for arm64 which also affects arm, powerpc,
> alpha, mips, riscv, ...
>
> Chances are, they won't even realise they need to implement
> arch_faults_on_old_pte() until somebody runs into the double fault and
> wastes lots of time debugging it before they spot your patch.

If I understand the semantics correctly, we should have this set to true. I
don't have any context here, but we've got

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /*
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * The kernel assumes that TLBs don't cache invalid
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * entries, but in RISC-V, SFENCE.VMA specifies an
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * ordering constraint, not a cache flush; it is
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * necessary even after writing invalid entries.
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ local_flush_tlb_page(addr);

in do_page_fault().

Ok, although I think this is really about whether or not your hardware can
make a pte young when accessed, or whether you take a fault and do it
by updating the pte explicitly.

v12 of the patches did change the default, so you should be "safe" with
those either way:

http://lists.infradead.org/pipermail/linux-arm-kernel/2019-October/686030.html

OK, that fence is because we allow invalid translations to be cached, which is a completely different issue.

RISC-V implementations are allowed to have software managed accessed/dirty bits. For some reason I thought we were relying on the firmware to handle this, but I can't actually find the code so I might be crazy. Wherever it's done, there's no spec enforcing it so we should leave this true on RISC-V.

Thanks for the confirmation. So we can keep the default arch_faults_on_old_pte (return true) on RISC-V.


Thanks.


---
Cheers,
Justin (Jia He)