Re: [BUG] workqueues and printk not playing nice since next-20240130

From: John Ogness
Date: Fri Feb 02 2024 - 12:02:41 EST


On 2024-02-02, "Paul E. McKenney" <paulmck@xxxxxxxxxx> wrote:
>> The printk ringbuffer contents would certainly be interesting.
>>
>> If you build the GDB scripts (CONFIG_GDB_SCRIPTS) then you will have:
>>
>> (gdb) lx-dmesg
>
> This says no such command even though I do have CONFIG_GDB_SCRIPTS=y
> in my .config.

You actually need to build them as well. The target is "scripts_gdb"

And you probably need to add:

add-auto-load-safe-path /path/to/your/kernel/build/directory

to your .gdbinit

(This is documented in Documentation/dev-tools/gdb-kernel-debugging.rst)

>> As an alternative, you could copy the contents of
>> Documentation/admin-guide/kdump/gdbmacros.txt into your .gdbinit and
>> then will have:
>>
>> (gdb) dmesg
>
> This one hangs.

:-/ I will look into this.

> On the other hand, next-20240202 doesn't show the problem. No idea
> what might have changed. :-/

Did you check the backtrace on all the "threads"? I would expect one of
them has tty in it and is probably deadlocked. There are known problems
that if a WARN or lockdep triggers while holding the port lock, that CPU
will deadlock itself. That has the effect that no output is generated,
but all the other CPUs will run fine. And even printk() calls will
happily store into the ringbuffer because they use trylock for printing
and the deadlocked CPU will be holding the lock.

John