Re: [PATCH v4 0/7] HWpoison: further fixes and cleanups

From: HORIGUCHI NAOYA(堀口 直也)
Date: Thu Sep 17 2020 - 07:41:53 EST


On Thu, Sep 17, 2020 at 10:10:42AM +0200, Oscar Salvador wrote:
> This patchset includes some fixups (patch#1,patch#2 and patch#3)
> and some cleanups (patch#4-7).
>
> Patch#1 is a fix to take off HWPoison pages off a buddy freelist since
> it can lead us to having HWPoison pages back in the game without no one
> noticing it.
> So fix it (we did that already for soft_offline_page [1]).
>
> Patch#2 is fixing a rebasing problem that made the call
> to page_handle_poison from _soft_offline_page set the
> wrong value for hugepage_or_freepage. [2]
>
> Patch#3 is not really a fixup, but tries to re-handle a page
> in case it was allocated under us.

Thanks for the update.
This patchset triggers the following BUG_ON() with Aristeu's workload:

[ 1010.400900] Soft offlining pfn 0xbff8c at process virtual address 0x7fe6c99c8000
[ 1010.402931] page:00000000f5670686 refcount:1 mapcount:-128 mapping:0000000000000000 index:0x1 pfn:0xbff89
[ 1010.405604] flags: 0xfffe000800000(hwpoison)
[ 1010.406755] raw: 000fffe000800000 ffffcddf029ab848 ffffcddf02ff9448 0000000000000000
[ 1010.408824] raw: 0000000000000001 0000000000000000 00000001ffffff7f 0000000000000000
[ 1010.410877] page dumped because: VM_BUG_ON_PAGE(page_count(buddy) != 0)
[ 1010.412673] ------------[ cut here ]------------
[ 1010.413930] kernel BUG at mm/page_alloc.c:800!
[ 1010.415143] invalid opcode: 0000 [#1] SMP PTI
[ 1010.416320] CPU: 3 PID: 1340 Comm: kworker/3:0 Not tainted 5.9.0-rc2-mm1-v5.9-rc2-200917-1952-00212-gf1a0765b04cb+ #33
[ 1010.419101] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[ 1010.422645] Workqueue: mm_percpu_wq drain_local_pages_wq
[ 1010.424075] RIP: 0010:__free_one_page+0x552/0x580
[ 1010.425344] Code: 48 c7 c6 90 6c 0f 84 4c 89 e7 e8 69 7e fd ff 0f 0b 0f 1f 44 00 00 e9 e5 fc ff ff 48 c7 c6 c8 f3 11 84 4c 89 f7 e8 4e 7e fd ff <0f> 0b 83 fb 08 0f 86 cb fc ff ff 48 83 c4 20 5b 5d 41 5c 41 5d 41
[ 1010.430231] RSP: 0018:ffffaa96c171fda0 EFLAGS: 00010082
[ 1010.431651] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
[ 1010.433598] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8dc8bbd18d08
[ 1010.435627] RBP: 00000000000bff88 R08: ffff8dc8bbd18d00 R09: 6573756163656220
[ 1010.437544] R10: 6163656220646570 R11: 6d75642065676170 R12: ffffcddf02ffe200
[ 1010.439376] R13: 00000000000bff89 R14: ffffcddf02ffe240 R15: ffff8dc7bffd5680
[ 1010.441271] FS: 0000000000000000(0000) GS:ffff8dc8bbd00000(0000) knlGS:0000000000000000
[ 1010.443349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1010.444892] CR2: 00007f6b69f92000 CR3: 0000000139c4a000 CR4: 00000000001506e0
[ 1010.446746] Call Trace:
[ 1010.447424] free_pcppages_bulk+0x1d4/0x2c0
[ 1010.448553] drain_pages_zone+0x42/0x50
[ 1010.449585] drain_local_pages_wq+0xe/0x10
[ 1010.450702] process_one_work+0x1b0/0x360
[ 1010.451769] worker_thread+0x50/0x3a0
[ 1010.452940] ? process_one_work+0x360/0x360
[ 1010.454072] kthread+0xfe/0x140
[ 1010.454989] ? kthread_park+0x90/0x90
[ 1010.455970] ret_from_fork+0x22/0x30

This message seems to show that the pages to be moved to buddy have refcount.
Could you review how changes in v3 -> v4 make it?

Here's my reproducer.

[build1:~]$ cat test_ksm_madv_soft.c
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/types.h>
#include <errno.h>
#include <stdlib.h>

#define MADV_SOFT_OFFLINE 101

#define err(x) perror(x),exit(EXIT_FAILURE)

int main() {
int ret;
int size = 100000*0x1000;

char *p1 = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
printf("p1 %p\n", p1);
char *p2 = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
printf("p2 %p\n", p2);

ret = madvise(p1, size, MADV_MERGEABLE);
printf("madvise(p1) %d\n", ret);
ret = madvise(p2, size, MADV_MERGEABLE);
printf("madvise(p2) %d\n", ret);

printf("writing p1 ... ");
memset(p1, 'a', size);
printf("done\n");
printf("writing p2 ... ");
memset(p2, 'a', size);
printf("done\n");

usleep(10000000);
printf("soft offline\n");
ret = madvise(p1, size, MADV_SOFT_OFFLINE);
printf("soft offline returns %d\n", ret);
if (ret)
err("madvise");

madvise(p1, size, MADV_UNMERGEABLE);
madvise(p2, size, MADV_UNMERGEABLE);

printf("OK\n");
}

[build1:~/upstream/mm_regression/lib]$ cat tmp_run_ksm_madv.sh

rm test_ksm_madv_soft 2> /dev/null
gcc -o test_ksm_madv_soft test_ksm_madv_soft.c || exit 1

echo 0 > /sys/kernel/mm/ksm/sleep_millisecs
echo 100000 > /sys/kernel/mm/ksm/pages_to_scan
echo 100000 > /sys/kernel/mm/ksm/max_page_sharing
echo 2 > /sys/kernel/mm/ksm/run
echo 1 > /sys/kernel/mm/ksm/run

./test_ksm_madv_soft

Thanks,
Naoya Horiguchi