Re: [PATCH 0/9] sched: make WARN_ON under rq->lock deadlock-safe (SCHED_WARN_ON)

From: Peter Zijlstra

Date: Thu Jun 11 2026 - 15:19:56 EST


On Thu, Jun 11, 2026 at 12:20:27PM -0400, Rik van Riel wrote:
> On Thu, 2026-06-11 at 09:43 +0200, Peter Zijlstra wrote:
> >
> > Sorry, no. I've said it before and I'll stick with it. Just no.
> >
> > printk_deferred() is an abomination, it means that if you mess up the
> > machine properly you'll *NEVER* see the output.
> >
> > As per always, printk() is the one that needs fixing, and IIRC they
> > were
> > very close to getting there.
>
> Printk to certain console types is always deferred,
> by default, because trying to synchronously print
> everything to a slow serial console can lead to a
> system softlockup panic.
>
> In fact, this particular lockup is due to the
> printk being passed off to a worker thread, and
> the kernel deadlocking when the wakeup code
> tries to grab the runqueue lock its CPU already
> holds.
>
> You are right that this could be fixed in the
> printk code, but the solution there will by
> necessity continue to contain some deferring.
>
>
> I suppose the printk code could use something
> like an irq work to wake up the printk worker,
> and avoid the scheduler deadlock that way?
>
> I'm not sure that would make things more
> reliable, though...

The non-atomic consoles will always need a buffer, but atomic consoles
can (and should IMO) push out the messages immediately.

Anyway, the thing that keeps tripping is that console_sem thing, that
should just entirely go away. That thing ends up doing a wakeup from
printk() call context, which obviously doesn't work when inside the
scheduler locks.

So printk should:

- stick msg in buffer (lockless)
- print to atomic consoles (lockless)
- use irq_work to wake console kthreads (lockless)
- each kthread then tries to flush buffer to its own non-atomic console
in non-atomic context.

>From what I understand, we're very close to having this work. The only
disagreement between printk people an me is about defaults IIRC. I want
to have the serial console default to atomic, they want it to be an
option. But whatever, as long as I can specify on the kernel cmdline
that my serial should be atomic I'm good.

Myself, I almost exclusively run with earlycon serial and force printk
to be early_printk() (effectively not using printk at all). This works
perfectly fine and is the most reliable thing ever. I can push out
characters to the UART from any context.

And sure, sometimes its gets a little scrambled, but meh.

I have this working with real actual serial, IPMI/serial-over-lan and
AMT/serial-over-lan.

And if a machine doesn't have serial, its a paperweight ;-)

Now, of course I also use trace_printk() a lot, but for those moments
when the machine goes down hard, nothing beats serial.