Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared

From: Palmer Dabbelt
Date: Wed Oct 16 2019 - 19:22:17 EST


On Tue, 08 Oct 2019 05:39:44 PDT (-0700), will@xxxxxxxxxx wrote:
On Tue, Oct 08, 2019 at 02:19:05AM +0000, Justin He (Arm Technology China) wrote:
> -----Original Message-----
> From: Will Deacon <will@xxxxxxxxxx>
> Sent: 2019å10æ1æ 20:54
> To: Justin He (Arm Technology China) <Justin.He@xxxxxxx>
> Cc: Catalin Marinas <Catalin.Marinas@xxxxxxx>; Mark Rutland
> <Mark.Rutland@xxxxxxx>; James Morse <James.Morse@xxxxxxx>; Marc
> Zyngier <maz@xxxxxxxxxx>; Matthew Wilcox <willy@xxxxxxxxxxxxx>; Kirill A.
> Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>; linux-arm-
> kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-
> mm@xxxxxxxxx; Punit Agrawal <punitagrawal@xxxxxxxxx>; Thomas
> Gleixner <tglx@xxxxxxxxxxxxx>; Andrew Morton <akpm@linux-
> foundation.org>; hejianet@xxxxxxxxx; Kaly Xin (Arm Technology China)
> <Kaly.Xin@xxxxxxx>
> Subject: Re: [PATCH v10 3/3] mm: fix double page fault on arm64 if PTE_AF
> is cleared
>
> On Mon, Sep 30, 2019 at 09:57:40AM +0800, Jia He wrote:
> > diff --git a/mm/memory.c b/mm/memory.c
> > index b1ca51a079f2..1f56b0118ef5 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
> > 2;
> > #endif
> >
> > +#ifndef arch_faults_on_old_pte
> > +static inline bool arch_faults_on_old_pte(void)
> > +{
> > + return false;
> > +}
> > +#endif
>
> Kirill has acked this, so I'm happy to take the patch as-is, however isn't
> it the case that /most/ architectures will want to return true for
> arch_faults_on_old_pte()? In which case, wouldn't it make more sense for
> that to be the default, and have x86 and arm64 provide an override? For
> example, aren't most architectures still going to hit the double fault
> scenario even with your patch applied?

No, after applying my patch series, only those architectures which don't provide
setting access flag by hardware AND don't implement their arch_faults_on_old_pte
will hit the double page fault.

The meaning of true for arch_faults_on_old_pte() is "this arch doesn't have the hardware
setting access flag way, it might cause page fault on an old pte"
I don't want to change other architectures' default behavior here. So by default,
arch_faults_on_old_pte() is false.

...and my complaint is that this is the majority of supported architectures,
so you're fixing something for arm64 which also affects arm, powerpc,
alpha, mips, riscv, ...

Chances are, they won't even realise they need to implement
arch_faults_on_old_pte() until somebody runs into the double fault and
wastes lots of time debugging it before they spot your patch.

If I understand the semantics correctly, we should have this set to true. I don't have any context here, but we've got

/*
* The kernel assumes that TLBs don't cache invalid
* entries, but in RISC-V, SFENCE.VMA specifies an
* ordering constraint, not a cache flush; it is
* necessary even after writing invalid entries.
*/
local_flush_tlb_page(addr);

in do_page_fault().

Btw, currently I only observed this double pagefault on arm64's guest
(host is ThunderX2). On X86 guest (host is Intel(R) Core(TM) i7-4790 CPU
@ 3.60GHz ), there is no such double pagefault. It has the similar setting
access flag way by hardware.

Right, and that's why I'm not concerned about x86 for this problem.

Will