Re: [BUG] workqueues and printk not playing nice since next-20240130

From: Paul E. McKenney
Date: Fri Feb 02 2024 - 12:36:31 EST


On Fri, Feb 02, 2024 at 06:08:25PM +0106, John Ogness wrote:
> On 2024-02-02, "Paul E. McKenney" <paulmck@xxxxxxxxxx> wrote:
> >> The printk ringbuffer contents would certainly be interesting.
> >>
> >> If you build the GDB scripts (CONFIG_GDB_SCRIPTS) then you will have:
> >>
> >> (gdb) lx-dmesg
> >
> > This says no such command even though I do have CONFIG_GDB_SCRIPTS=y
> > in my .config.
>
> You actually need to build them as well. The target is "scripts_gdb"
>
> And you probably need to add:
>
> add-auto-load-safe-path /path/to/your/kernel/build/directory
>
> to your .gdbinit
>
> (This is documented in Documentation/dev-tools/gdb-kernel-debugging.rst)

Thank you! Next time I am in a similar situation, I will pay more
attention to the documentation.

> >> As an alternative, you could copy the contents of
> >> Documentation/admin-guide/kdump/gdbmacros.txt into your .gdbinit and
> >> then will have:
> >>
> >> (gdb) dmesg
> >
> > This one hangs.
>
> :-/ I will look into this.
>
> > On the other hand, next-20240202 doesn't show the problem. No idea
> > what might have changed. :-/
>
> Did you check the backtrace on all the "threads"? I would expect one of
> them has tty in it and is probably deadlocked. There are known problems
> that if a WARN or lockdep triggers while holding the port lock, that CPU
> will deadlock itself. That has the effect that no output is generated,
> but all the other CPUs will run fine. And even printk() calls will
> happily store into the ringbuffer because they use trylock for printing
> and the deadlocked CPU will be holding the lock.

Again, thank you, and another thing for me to try should this start
happening again.

Thanx, Paul