Re: Re: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

From: zhenwei pi
Date: Mon Jun 06 2022 - 03:25:13 EST




On 6/6/22 12:32, HORIGUCHI NAOYA(堀口 直也) wrote:
On Sun, Jun 05, 2022 at 12:24:24PM +0800, zhenwei pi wrote:


On 6/5/22 02:56, Andrew Morton wrote:
On Sat, 4 Jun 2022 18:32:29 +0800 zhenwei pi <pizhenwei@xxxxxxxxxxxxx> wrote:

Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page
puts page back buddy only, this leads BUG during accessing on the
corrupted KPTE.

Thank you for the patch. I think this will be helpful for integration testing.

You mention "hardware corrupted page" as the condition of this bug, and I
think that it means a real hardware error, but this BUG seems to be
triggered when we use mce-inject or APEI (these are also software injection
without corrupting the memory physically). So the actual condition is
"when memory_failure() is called by MCE handler"?


Yes, I use QEMU to emulate a 'real hardware error' by command:
virsh qemu-monitor-command vm --hmp mce 0 9 0xbd000000000000c0 0xd 0x61234000 0x8c


Do not allow to unpoison hardware corrupted page in unpoison_memory()
to avoid BUG like this:

Unpoison: Software-unpoisoned page 0x61234
BUG: unable to handle page fault for address: ffff888061234000

Thanks.

--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn)
{
struct page *page;
struct page *p;
+ pte_t *kpte;
int ret = -EBUSY;
int freeit = 0;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
@@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn)
p = pfn_to_page(pfn);
page = compound_head(p);
+ kpte = virt_to_kpte((unsigned long)page_to_virt(p));
+ if (kpte && !pte_present(*kpte)) {
+ unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n",
+ pfn, &unpoison_rs);

This can prevent unpoison for hwpoison on 4kB pages, but not for hugetlb pages,
where I see the similar BUG as follows (even with applying your patch):

[ 917.806712] BUG: unable to handle page fault for address: ffff9f7bb3201000
[ 917.810144] #PF: supervisor write access in kernel mode
[ 917.812588] #PF: error_code(0x0002) - not-present page
[ 917.815007] PGD 104801067 P4D 104801067 PUD 10006b063 PMD 1052d0063 PTE 800ffffeccdfe062
[ 917.818768] Oops: 0002 [#1] PREEMPT SMP PTI
[ 917.820759] CPU: 0 PID: 7774 Comm: test_alloc_gene Tainted: G M OE 5.18.0-v5.18-220606-0942-029-ge4dcc+ #47
[ 917.825720] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
[ 917.829762] RIP: 0010:clear_page_erms+0x7/0x10
[ 917.831867] Code: 48 89 47 18 48 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d9 90 c3 0f 1f 80 00 00 00 00 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc cc cc 48 85 ff 0f 84 d3 00 00 00 0f b6 0f 4c
[ 917.840540] RSP: 0000:ffffab49c25ebdf0 EFLAGS: 00010246
[ 917.842839] RAX: 0000000000000000 RBX: ffffd538c4cc8000 RCX: 0000000000001000
[ 917.845835] RDX: 0000000080000000 RSI: 00007f2aeb600000 RDI: ffff9f7bb3201000
[ 917.848687] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 917.851377] R10: 0000000000000002 R11: ffff9f7b87e3a2a0 R12: 0000000000000000
[ 917.854035] R13: 0000000000000001 R14: ffffd538c4cc8000 R15: ffff9f7bc002a5d8
[ 917.856539] FS: 00007f2aebad3740(0000) GS:ffff9f7bbbc00000(0000) knlGS:0000000000000000
[ 917.859229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 917.861149] CR2: ffff9f7bb3201000 CR3: 0000000107726003 CR4: 0000000000170ef0
[ 917.863433] Call Trace:
[ 917.864266] <TASK>
[ 917.864961] clear_huge_page+0x147/0x270
[ 917.866236] hugetlb_fault+0x440/0xad0
[ 917.867366] handle_mm_fault+0x270/0x290
[ 917.868532] do_user_addr_fault+0x1c3/0x680
[ 917.869768] exc_page_fault+0x6c/0x160
[ 917.870912] ? asm_exc_page_fault+0x8/0x30
[ 917.872082] asm_exc_page_fault+0x1e/0x30
[ 917.873220] RIP: 0033:0x7f2aeb8ba367

I don't think of a workaround for this now ...


Could you please tell me how to reproduce this issue?

+ return -EPERM;

Is -EOPNOTSUPP a better error code?


OK!

+ }
+
mutex_lock(&mf_mutex);
if (!PageHWPoison(p)) {

I guess we don't want to let fault injection crash the kernel, so a
cc:stable seems appropriate here.

Can we think up a suitable Fixes: commit? I'm suspecting this bug has
been there for a long time?


Sure!

2009-Dec-16, hwpoison_unpoison() was introduced into linux in commit:
847ce401df392("HWPOISON: Add unpoisoning support")
...
There is no hardware level unpoisioning, so this cannot be used for real
memory errors, only for software injected errors.
...

We can find that this function should be used for software level unpoisoning
only in both commit log and comment in source code. unfortunately there is
no check in function hwpoison_unpoison().


2020-May-20, 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the whole
page is affected and poisoned")

This clears KPTE, and leads BUG(described in this patch) during unpoisoning
the hardware corrupted page.


Fixes: 847ce401df392("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the whole page
is affected and poisoned")

Cc: Wu Fengguang <fengguang.wu@xxxxxxxxx>
Cc: Tony Luck <tony.luck@xxxxxxxxx>.

Thanks for checking the history, I agree with sending to stable.

Thanks,
Naoya Horiguchi

--
zhenwei pi