Re: [RFC] Printk deadlock in bpf trace called from scheduler context

From: Marco Elver
Date: Mon Jul 29 2024 - 08:46:20 EST


On Mon, 29 Jul 2024 at 14:27, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Mon, Jul 29, 2024 at 01:46:09PM +0200, Radoslaw Zielonek wrote:
> > I am currently working on a syzbot-reported bug where bpf
> > is called from trace_sched_switch. In this scenario, we are still within
> > the scheduler context, and calling printk can create a deadlock.
> >
> > I am uncertain about the best approach to fix this issue.
>
> It's been like this forever, it doesn't need fixing, because tracepoints
> shouldn't be doing printk() in the first place.
>
> > Should we simply forbid such calls, or perhaps we should replace printk
> > with printk_deferred in the bpf where we are still in scheduler context?
>
> Not doing printk() is best.

And teaching more debugging tools to behave.

This particular case originates from fault injection:

> [ 60.265518][ T8343] should_fail_ex+0x383/0x4d0
> [ 60.265547][ T8343] strncpy_from_user+0x36/0x2d0
> [ 60.265601][ T8343] strncpy_from_user_nofault+0x70/0x140
> [ 60.265637][ T8343] bpf_probe_read_user_str+0x2a/0x70

Probably the fail_dump() function in lib/fault-inject.c being a little
too verbose in this case.

Radoslaw, the fix should be in lib/fault-inject.c. Similar to other
debugging tools (like KFENCE, which you discovered) adding
lockdep_off()/lockdep_on(), prink_deferred, or not being as verbose in
this context may be more appropriate. Fault injection does not need to
print a message to inject a fault - the message is for debugging
purposes. Probably a reasonable compromise is to use printk_deferred()
in fail_dump() if in this context to still help with debugging on a
best effort basis. You also need to take care to avoid dumping the
stack in fail_dump().