Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU Hotplug
From: Aboorva Devarajan
Date: Thu Jul 11 2024 - 08:01:21 EST
Hi Tejun,
On Tue, 2024-07-09 at 14:02 -1000, Tejun Heo wrote:
> Hello,
>
> On Tue, Jul 09, 2024 at 01:21:38PM +0530, Aboorva Devarajan wrote:
> ...
> > Steps to reproduce the issue:
> >
> > - Run the scx_simple scheduler.
> > - Randomly offline and online CPUs from 1 to 127.
> >
> > During the CPU hotplug, the custom scheduler successfully
> > unregisters and registers itself without issues. However, it
> > occasionally encounters a crash and the system becomes unresponsive.
>
> How reproducible is the problem?
>
>
This is highly transient and occurs approximately once in every 15-20 mins
when continuosly running CPU hotplug online and offline operations while
`scx_simple` scheduler is up and running.
> ...
> > [64650.801834] task:scx_simple state:D stack:0 pid:11214 tgid:11214 ppid:11013 flags:0x00040080
> > [64650.801949] Sched_ext: simple (prepping)
> > [64650.801978] Call Trace:
> > [64650.802062] [c000000e9bdf35b0] [c0000000016f0c06] scx_dump_state.trunc_marker+0x2078/0x5d2a (unreliable)
> > [64650.802231] [c000000e9bdf3760] [c00000000001dd68] __switch_to+0x238/0x310
> > [64650.802437] [c000000e9bdf37c0] [c0000000013e6170] __schedule+0xa10/0xf80
> > [64650.802599] [c000000e9bdf38c0] [c0000000013e6764] schedule+0x84/0x128
> > [64650.802700] [c000000e9bdf38f0] [c00000000023f4dc] percpu_rwsem_wait+0x13c/0x1a0
> > [64650.802807] [c000000e9bdf3970] [c0000000013ee7ec] __percpu_down_read+0xac/0x21c
> > [64650.802926] [c000000e9bdf39f0] [c0000000001739b8] cpus_read_lock+0x158/0x180
> > [64650.803065] [c000000e9bdf3a30] [c00000000021fd88] bpf_scx_reg+0x5d8/0xb80
> > [64650.803161] [c000000e9bdf3c10] [c00000000049e308] bpf_struct_ops_link_create+0x158/0x200
> > [64650.803259] [c000000e9bdf3c80] [c000000000407628] link_create+0x78/0x3b0
> > [64650.803319] [c000000e9bdf3cd0] [c0000000004026fc] __sys_bpf+0x39c/0x560
> > [64650.803389] [c000000e9bdf3dc0] [c000000000400b00] sys_bpf+0x50/0x80
> > [64650.803449] [c000000e9bdf3e00] [c00000000003218c] system_call_exception+0x10c/0x2b0
> > [64650.803559] [c000000e9bdf3e50] [c00000000000c7d4] system_call_common+0xf4/0x258
> ...
> > [64650.808330] Showing all locks held in the system:
> > [64650.808391] 1 lock held by kthreadd/2:
> > [64650.808431] #0: c0000000029259c8 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x80/0x240
> ...
> > [64650.810515] 4 locks held by scx_simple/11214:
> > [64650.810606] #0: c000000002a609b0 (update_mutex){+.+.}-{4:4}, at: bpf_struct_ops_link_create+0x134/0x200
> > [64650.810781] #1: c000000002927170 (scx_ops_enable_mutex){+.+.}-{4:4}, at: bpf_scx_reg+0x80/0xb80
> > [64650.810886] #2: c0000000029259c8 (scx_fork_rwsem){++++}-{0:0}, at: bpf_scx_reg+0x5d0/0xb80
> > [64650.811023] #3: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: bpf_scx_reg+0x5d8/0xb80
> > [64650.811174] 7 locks held by hotplug.sh/11440:
> > [64650.811240] #0: c000000e9bfe6450 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0xcc/0x370
> > [64650.811362] #1: c000000e11b31e90 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x10c/0x260
> > [64650.811514] #2: c000000df2892620 (kn->active#73){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x118/0x260
> > [64650.811675] #3: c000000002aebc08 (device_hotplug_lock){+.+.}-{4:4}, at: online_store+0x6c/0x1a0
> > [64650.811814] #4: c000007ff4ce0128 (&dev->mutex){....}-{4:4}, at: online_store+0x94/0x1a0
> > [64650.811929] #5: c000000002913168 (cpu_add_remove_lock){+.+.}-{4:4}, at: cpu_up+0xa4/0x1b0
> > [64650.812061] #6: c000000002913278 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_up+0x78/0x420
>
> It's difficult to tell from the log but I wonder whether the problem is CPU
> hotplug operation needing to fork to complete (e.g. maybe it needs to
> schedule a work item and flush it?) while sched_ext enable path first blocks
> forks through scx_fork_rwsem and then does cpus_read_lock(). That'd create
> ABBA deadlock. It's a bit tricky as the existing code already makes it
> impossible to flip the nesting order.
Yes, it looks like it could be a ABBA deadlock, but I'm not able to get enough
details with lockdep and lockstat enabled. I'll try to collect more traces and
investigate further.
>
> It could also be specific to ppc64 (e.g. there's something dependingon fork
> during CPU hotplug), or at least doesn't happen on x86. I'll probe it a bit
> more.
Sure, so far I haven't been able to reproduce the issue on x86, also I'm running
some tests for a longer duration to see if it can be recreated on x86 as well
inorder to rule out the possiblity of it being a ppc64 specific issue, it could
be that the issue occurs much rarer on x86 when compared to ppc64.
Thanks,
Aboorva