Re: scheduling while atomic in z3fold

From: Mike Galbraith
Date: Tue Dec 01 2020 - 21:33:10 EST


On Mon, 2020-11-30 at 17:03 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 16:01:11 [+0100], Mike Galbraith wrote:
> > On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > > How do you test this? I triggered a few oom-killer and I have here git
> > > gc running for a few hours now… Everything is fine.
> >
> > In an LTP install, ./runltp -f mm. Shortly after box starts swapping
> > insanely, it explodes quite reliably here with either z3fold or
> > zsmalloc.. but not with zbud.
>
> This just passed. It however killed my git-gc task which wasn't done.
> Let me try tomorrow with your config.

What I'm seeing is the below. rt_mutex_has_waiters() says yup we have
a waiter, rt_mutex_top_waiter() emits the missing cached leftmost, and
rt_mutex_dequeue_pi() chokes on it. Lock is buggered.

[ 894.376962] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ 894.377639] #PF: supervisor read access in kernel mode
[ 894.378130] #PF: error_code(0x0000) - not-present page
[ 894.378735] PGD 0 P4D 0
[ 894.378974] Oops: 0000 [#1] PREEMPT_RT SMP PTI
[ 894.379384] CPU: 2 PID: 78 Comm: oom_reaper Kdump: loaded Tainted: G E 5.9.11-rt20-rt #9
[ 894.380253] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[ 894.381352] RIP: 0010:mark_wakeup_next_waiter+0x51/0x150
[ 894.381869] Code: 00 00 49 89 f5 e8 9f 1c 7c 00 48 8b 5d 10 48 85 db 74 0a 48 3b 6b 38 0f 85 00 01 00 00 65 4c 8b 3c 25 c0 8d 01 00 4c 8d 73 18 <4c> 39 73 18 0f 85 94 00 00 00 65 48 8b 3c 25 c0 8d 01 00 48 8b 87
[ 894.383640] RSP: 0018:ffffb792802cfb18 EFLAGS: 00010046
[ 894.384135] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[ 894.384804] RDX: 0000000000000001 RSI: ffffb792802cfb68 RDI: 0000000000000001
[ 894.385473] RBP: ffff997b4e508628 R08: ffff997b39075000 R09: ffff997a47800db0
[ 894.386134] R10: 0000000000000000 R11: ffffffff8a58f4d8 R12: ffffb792802cfb58
[ 894.387030] R13: ffffb792802cfb68 R14: 0000000000000018 R15: ffff997a7f1d3300
[ 894.387715] FS: 0000000000000000(0000) GS:ffff997b77c80000(0000) knlGS:0000000000000000
[ 894.388476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 894.389209] CR2: 0000000000000018 CR3: 00000001cc156006 CR4: 00000000001706e0
[ 894.389881] Call Trace:
[ 894.390127] rt_mutex_futex_unlock+0x4f/0x90
[ 894.390547] z3fold_zpool_free+0x539/0x5c0
[ 894.390930] zswap_free_entry+0x43/0x50
[ 894.391193] zswap_frontswap_invalidate_page+0x8a/0x90
[ 894.391544] __frontswap_invalidate_page+0x48/0x80
[ 894.391875] swapcache_free_entries+0x1ee/0x330
[ 894.392189] ? rt_mutex_futex_unlock+0x65/0x90
[ 894.392502] free_swap_slot+0xad/0xc0
[ 894.392757] __swap_entry_free+0x70/0x90
[ 894.393046] free_swap_and_cache+0x39/0xe0
[ 894.393351] unmap_page_range+0x5e1/0xb30
[ 894.393646] ? flush_tlb_mm_range+0xfb/0x170
[ 894.393965] __oom_reap_task_mm+0xb2/0x170
[ 894.394254] ? __switch_to+0x12a/0x520
[ 894.394514] oom_reaper+0x119/0x540
[ 894.394756] ? wait_woken+0xa0/0xa0
[ 894.394997] ? __oom_reap_task_mm+0x170/0x170
[ 894.395297] kthread+0x169/0x180
[ 894.395535] ? kthread_park+0x90/0x90
[ 894.395867] ret_from_fork+0x22/0x30
[ 894.396252] Modules linked in: ebtable_filter(E) ebtables(E) uinput(E) fuse(E) rpcsec_gss_krb5(E) nfsv4(E) xt_comment(E) dns_resolver(E) nfs(E) nf_log_ipv6(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) nfs_ssc(E) fscache(E>
[ 894.396280] cryptd(E) glue_helper(E) pcspkr(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) sch_fq_codel(E) hid_generic(E) usbhid(E) ext4(E) crc16(E) mbcache(E) jbd2(E) ata_generic(E) virtio_console(E) virtio_blk(E)>
[ 894.406791] Dumping ftrace buffer:
[ 894.407037] (ftrace buffer empty)
[ 894.407293] CR2: 0000000000000018

crash> gdb list *mark_wakeup_next_waiter+0x51
0xffffffff810e87e1 is in mark_wakeup_next_waiter (kernel/locking/rtmutex.c:362).
357 }
358
359 static void
360 rt_mutex_dequeue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter)
361 {
362 if (RB_EMPTY_NODE(&waiter->pi_tree_entry))
363 return;
364
365 rb_erase_cached(&waiter->pi_tree_entry, &task->pi_waiters);
366 RB_CLEAR_NODE(&waiter->pi_tree_entry);

crash> rwlock_t -x 0xffff997b4e508628
struct rwlock_t {
rtmutex = {
wait_lock = {
raw_lock = {
{
val = {
counter = 0x1
},
{
locked = 0x1,
pending = 0x0
},
{
locked_pending = 0x1,
tail = 0x0
}
}
}
},
waiters = {
rb_root = {
rb_node = 0xffff997b4e508580
},
rb_leftmost = 0x0
},
owner = 0xffff997a7f1d3300,
save_state = 0x1
},
readers = {
counter = 0x80000000
}
}
crash> rb_root 0xffff997b4e508580
struct rb_root {
rb_node = 0x0
}