Re: [PATCH] mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages

From: Dan Williams
Date: Tue Jun 27 2017 - 18:09:33 EST


On Tue, Jun 27, 2017 at 3:04 PM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
>> > > > +if (set_memory_np(decoy_addr, 1))
>> > > > +pr_warn("Could not invalidate pfn=0x%lx from 1:1 map \n",
>>
>> Another concept to consider is mapping the page as UC rather than
>> completely unmapping it.
>
> UC would also avoid the speculative prefetch issue. The Vol 3, Section 11.3 SDM says:
>
> Strong Uncacheable (UC) -System memory locations are not cached. All reads and writes
> appear on the system bus and are executed in program order without reordering. No speculative
> memory accesses, pagetable walks, or prefetches of speculated branch targets are made.
> This type of cache-control is useful for memory-mapped I/O devices. When used with normal
> RAM, it greatly reduces processor performance.
>
> But then I went and read the code for set_memory_uc() ... which calls "reserve_memtyep()"
> which does all kinds of things to avoid issues with MTRRs and other stuff. Which all looks
> really more complex that we need just here.
>
>> The uncorrectable error scope could be smaller than a page size, like:
>> * memory ECC width (e.g., 8 bytes)
>> * cache line size (e.g., 64 bytes)
>> * block device logical block size (e.g., 512 bytes, for persistent memory)
>>
>> UC preserves the ability to access adjacent data within the page that
>> hasn't gone bad, and is particularly useful for persistent memory.
>
> If you want to dig into the non-poisoned pieces of the page later it might be
> better to set up a new scratch UC mapping to do that.
>
> My takeaway from Dan's comments on unpoisoning is that this isn't the context
> that he wants to do that. He'd rather wait until he has somebody overwriting the
> page with fresh data.
>
> So I think I'd like to keep the patch as-is.

Yes, the persistent-memory poison interactions should be handled
separately and not hold up this patch for the normal system-memory
case. We might dove-tail support for this into stray write protection
where we unmap all of pmem while nothing in the kernel is actively
accessing it.