Re: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

From: zhenwei pi
Date: Sun Jun 05 2022 - 01:43:12 EST




On 6/5/22 02:56, Andrew Morton wrote:
On Sat, 4 Jun 2022 18:32:29 +0800 zhenwei pi <pizhenwei@xxxxxxxxxxxxx> wrote:

Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page
puts page back buddy only, this leads BUG during accessing on the
corrupted KPTE.

Do not allow to unpoison hardware corrupted page in unpoison_memory()
to avoid BUG like this:

Unpoison: Software-unpoisoned page 0x61234
BUG: unable to handle page fault for address: ffff888061234000

Thanks.

--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn)
{
struct page *page;
struct page *p;
+ pte_t *kpte;
int ret = -EBUSY;
int freeit = 0;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
@@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn)
p = pfn_to_page(pfn);
page = compound_head(p);
+ kpte = virt_to_kpte((unsigned long)page_to_virt(p));
+ if (kpte && !pte_present(*kpte)) {
+ unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n",
+ pfn, &unpoison_rs);
+ return -EPERM;
+ }
+
mutex_lock(&mf_mutex);
if (!PageHWPoison(p)) {

I guess we don't want to let fault injection crash the kernel, so a
cc:stable seems appropriate here.

Can we think up a suitable Fixes: commit? I'm suspecting this bug has
been there for a long time?


Sure!

2009-Dec-16, hwpoison_unpoison() was introduced into linux in commit:
847ce401df392("HWPOISON: Add unpoisoning support")
...
There is no hardware level unpoisioning, so this cannot be used for real memory errors, only for software injected errors.
...

We can find that this function should be used for software level unpoisoning only in both commit log and comment in source code. unfortunately there is no check in function hwpoison_unpoison().


2020-May-20, 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")

This clears KPTE, and leads BUG(described in this patch) during unpoisoning the hardware corrupted page.


Fixes: 847ce401df392("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")

Cc: Wu Fengguang <fengguang.wu@xxxxxxxxx>
Cc: Tony Luck <tony.luck@xxxxxxxxx>.

--
zhenwei pi