Re: dmesg -w regression in v5.4.22, bisected, was: Re: [PATCH] char/random: silence a lockdep splat with printk()
From: Sergey Senozhatsky
Date: Tue Mar 24 2020 - 22:35:13 EST
On (20/03/24 11:13), Zygo Blaxell wrote:
> On Wed, Nov 13, 2019 at 04:16:25PM -0500, Qian Cai wrote:
> > From: Sergey Senozhatsky <sergey.senozhatsky.work@xxxxxxxxx>
> >
> > Sergey didn't like the locking order,
> >
> > uart_port->lock -> tty_port->lock
> >
> > uart_write (uart_port->lock)
> > __uart_start
> > pl011_start_tx
> > pl011_tx_chars
> > uart_write_wakeup
> > tty_port_tty_wakeup
> > tty_port_default
> > tty_port_tty_get (tty_port->lock)
> >
> > but those code is so old, and I have no clue how to de-couple it after
> > checking other locks in the splat. There is an onging effort to make all
> > printk() as deferred, so until that happens, workaround it for now as a
> > short-term fix.
>
> Starting with v5.4.22 I noticed 'dmesg -w' stopped working on some
> machines. dmesg will follow console output for a few seconds, then it
> stops. strace indicates dmesg is blocked in read() on the /dev/kmsg fd.
> If a new dmesg process starts, it gives messages for a few seconds,
> then also stops. rsyslog's kernel logging is similarly affected.
>
> Bisection points to this patch (now known as
> 1b710b1b10eff9d46666064ea25f079f70bc67a8 upstream). I can't reproduce
> the problem on a test VM, and some machines are running v5.4.22..v5.4.26
> with no dmesg problems. It seems there is some magic in the startup
> sequence of affected machines. This code isn't executed after RNG is
> seeded, so it would have to get its bad stuff done before that happens.
>
> Reverting commit 1b710b1b10eff9d46666064ea25f079f70bc67a8 fixes the
> dmesg regression on 5.4.26. It might put the original lockdep bug back,
> but on machines running stable kernels, I prefer randomly broken lockdep
> over repeatably broken dmesg.
This should fix the problem
https://lore.kernel.org/lkml/20200303113002.63089-1-sergey.senozhatsky@xxxxxxxxx
-ss