Re: printk deadlock due to double lock attempt on current CPU's runqueue
From: Daniel Vetter
Date: Wed Nov 10 2021 - 05:44:21 EST
On Wed, Nov 10, 2021 at 11:13:37AM +0106, John Ogness wrote:
> On 2021-11-10, Daniel Vetter <daniel@xxxxxxxx> wrote:
> > I'm a bit out of the loop but from lwn articles my understanding is
> > that part of upstreaming from -rt we no longer have the explicit "I'm
> > a safe console for direct printing" opt-in. Which I get from a
> > backwards compat pov, but I still think for at least fbcon we really
> > should never attempt a direct printk con->write, it's just all around
> > terrible.
>
> Right now we don't have an explicit "I'm a safe console for direct
> printing" option. Right now all printing is direct. But it sounds to me
> that we should add this console flag when we introduce kthread printers.
Ah I wasnt' clear whether this was merged already or not yet.
> > So yeah for fbcon at least I think we really should throw out direct
> > con->write from printk completely.
>
> Even after we introduce kthread printers, there will still be situations
> where direct printing is used: booting (before kthreads exist) and
> shutdown/suspend/crash situations, when the kthreads may not be
> active.
>
> I will introduce a console flag so that consoles can opt-out for direct
> printing. (opt-out rather than opt-in is probably easier, since there
> are only a few that would need to opt-out).
Yeah opt-out for fbcon sounds good to me, and then across the board (even
when the kthread is not yet or no longer running, it really doesn't make
anything better). Maybe with a fbcon module (or system wide
force_direct_printk) option to disable that flag in case someone needs
it.
> Since kthread printers do not yet exist (hoping to get them in for
> 5.17), I am not sure how we should address the reported bug for existing
> kernels.
I think we just can't. This thing keeps popping up in all kinds of places,
and in some cases it's really not possible to fix it. Like in
oops_in_progress we can just give up, but for normal printk from funny
places we'd corrupt the console (plus there's no flag to even detect these
cases), since we really need to launch these work_struct to get stuff to
the screen.
I think in the past we sometimes managed to work around the issue by
moving the printk outside of locks, but that's rarely a good idea just
because fbcon.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch