RE: [PATCH-resend] mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages

From: Elliott, Robert (Persistent Memory)
Date: Thu Aug 17 2017 - 18:30:37 EST




> -----Original Message-----
> From: Andrew Morton [mailto:akpm@xxxxxxxxxxxxxxxxxxxx]
> Sent: Thursday, August 17, 2017 5:10 PM
> To: Luck, Tony <tony.luck@xxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxx>;
> Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>; Elliott, Robert (Persistent
> Memory) <elliott@xxxxxxx>; x86@xxxxxxxxxx; linux-mm@xxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH-resend] mm/hwpoison: Clear PRESENT bit for kernel 1:1
> mappings of poison pages
>
> On Wed, 16 Aug 2017 10:18:03 -0700 "Luck, Tony" <tony.luck@xxxxxxxxx>
> wrote:
>
> > Speculative processor accesses may reference any memory that has a
> > valid page table entry. While a speculative access won't generate
> > a machine check, it will log the error in a machine check bank. That
> > could cause escalation of a subsequent error since the overflow bit
> > will be then set in the machine check bank status register.
> >
> > Code has to be double-plus-tricky to avoid mentioning the 1:1 virtual
> > address of the page we want to map out otherwise we may trigger the
> > very problem we are trying to avoid. We use a non-canonical address
> > that passes through the usual Linux table walking code to get to the
> > same "pte".
> >
> > Thanks to Dave Hansen for reviewing several iterations of this.
>
> It's unclear (to lil ole me) what the end-user-visible effects of this
> are.
>
> Could we please have a description of that? So a) people can
> understand your decision to cc:stable and b) people whose kernels are
> misbehaving can use your description to decide whether your patch might
> fix the issue their users are reporting.

In general, the system is subject to halting due to uncorrectable
memory errors at addresses that software is not even accessing.

The first error doesn't cause the crash, but if a second error happens
before the machine check handler services the first one, it'll find
the Overflow bit set and won't know what errors or how many errors
happened (e.g., it might have been problems in an instruction fetch,
and the instructions the CPU is slated to run are bogus). Halting is
the only safe thing to do.

For persistent memory, the BIOS reports known-bad addresses in the
ACPI ARS (address range scrub) table. They are likely to keep
reappearing every boot since it is persistent memory, so you can't
just reboot and hope they go away. Software is supposed to avoid
reading those addresses until it fixes them (e.g., writes new data
to those locations). Even if it follows this rule, the system can
still crash due to speculative reads (e.g., prefetches) touching
those addresses.

Tony's patch marks those addresses in the page tables so the CPU
won't speculatively try to read them.

---
Robert Elliott, HPE Persistent Memory