Re: [PATCH] panic: disable optimistic spin after halting CPUs

From: Petr Mladek
Date: Thu Jan 20 2022 - 11:49:02 EST

Next message: Rob Herring: "Re: [PATCH v2 08/11] PCI: mvebu: Use child_ops API"
Previous message: Vladimir Oltean: "Re: [PATCH RFC V1 net-next 3/4] net: Let the active time stamping layer be selectable."
In reply to: Stephen Brennan: "Re: [PATCH] panic: disable optimistic spin after halting CPUs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue 2022-01-18 15:13:51, Stephen Brennan wrote:
> Petr Mladek <pmladek@xxxxxxxx> writes:
> > On Thu 2022-01-13 11:36:35, Stephen Brennan wrote:
> Hi Petr,
>
> Sorry for the delayed response due to the US holiday weekend.

No problem at all :-)

> I switched to your approach, and began running it through my test script
> (from the commit message) on QEMU. However, 14% of the time (200/1409) I
> found that the system fell into an interesting race condition / loop.
>
> 178 void panic(const char *fmt, ...)
> 179 {
> ...
> 187 /*
> 188 * Disable local interrupts. This will prevent panic_smp_self_stop
> 189 * from deadlocking the first cpu that invokes the panic, since
> 190 * there is nothing to prevent an interrupt handler (that runs
> 191 * after setting panic_cpu) from invoking panic() again.
> 192 */
> 193 local_irq_disable();
> 194 preempt_disable_notrace();
> ...
> 211 this_cpu = raw_smp_processor_id();
> 212 old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
> 213
> 214 if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
> 215 panic_smp_self_stop();
> ...
> 226 pr_emerg("Kernel panic - not syncing: %s\n", buf);
> ...
> 250 if (!_crash_kexec_post_notifiers) { // <- HALT CPUs
> ...
>
> We disable IRQ and preemption at the beginning of panic(), then set
> panic_cpu. This opens a window (between lines 212 and 250) where
> printk() spinning is disabled. Then, we go ahead and we use printk()
> during this window, at line 226.
>
> If another CPU is adding to the log buffer at the right time (concurrent
> with the pr_emerg in line 226), then they can successfully prevent the
> panic CPU from making progress. Since spinning is disabled, the other
> CPU in vprintk_emit() will never get the console_sem, and will just
> leave its buffered contents in the log buffer. And if the other CPU can
> add to the log buffer faster than the panic CPU can drain it... then the
> panic CPU is stuck forever writing into the console. And I do mean
> forever.

Yeah. The console waiter logic helps a bit to avoid this livelock.

> If a watchdog triggers, it will find panic_cpu already set, and
> so it won't be able to do anything!
>
> Thus I'm now testing the following modification:
>
> // in console_trylock_spinning()
> if (atomic_read(&panic_cpu) == this_cpu)
> return 0;
>
> This would ensure that the panic CPU won't fall into the spin loop, but
> also ensures that other CPUs can't flood the panic CPU with buffered
> messages.

Hmm, I am not sure if we really want it:

The other CPU will likely become the console_sem owner. If the
other CPU is running, the current console_sem owner is likely
running as well and will be able to pass console_sem.

Yes, it will throttle the other CPU. It will become busy with
pushing its own messages to the console. So far so good.

But panic() will stop the other CPU at some stage. And the
panic CPU will not be able to push the messages to the
console because @console_sem is owned by the other (stopped) CPU.

panic CPU will try to get the messages out later in
console_flush_on_panic(). But it is prone to deadlocks.

We want panic CPU to become console_sem owner and be able to flush
the messages to the consoles a clean way. This is why I proposed
the 2nd patch where the non-panic owner releases console_sem.

I wonder if we could prevent the livelock another way. The livelock
happens when the messages added faster into the log buffer than
they can reach the console. The result must be that some messages
are lost, see in console_unlock():

if (console_seq != r.info->seq) {
console_dropped += r.info->seq - console_seq;
console_seq = r.info->seq;
}

A solution might be to disable printk() on non-panic CPUs when
some messages are repeatedly dropped during panic. I mean
something like:

void console_unlock(void)
{
[...]
+ static panic_console_dropped;
[...]
if (console_seq != r.info->seq) {
console_dropped += r.info->seq - console_seq;
console_seq = r.info->seq;
+
+ if (read_atomic(&panic_cpu) != PANIC_CPU_INVALID) {
+ if (panic_console_dropped++ > 10)
+ suppress_non_panic_printk = 1;
+ }
}

, where "suppress_non_panic_printk" will have similar effect as
"supress_printk" except that it will supress only printk
on non-panic CPUs.

I am not completely sure that it is the right approach. But it is not
completely bad:

+ allows the panic() CPU to become clean console_sem owner
+ blocks printk() only when many messages are lost anyway

> > Well, we could do this change separately. I could send the patch
> > for this part if you would prefer it. But feel free to send it
> > yourself.
>
> I'd be glad to write this patch, put it through my VM stress testing,
> and then send it. I've already got it setup so it might as well get put
> to good use :)

Great, thanks.

Best Regards,
Petr

Next message: Rob Herring: "Re: [PATCH v2 08/11] PCI: mvebu: Use child_ops API"
Previous message: Vladimir Oltean: "Re: [PATCH RFC V1 net-next 3/4] net: Let the active time stamping layer be selectable."
In reply to: Stephen Brennan: "Re: [PATCH] panic: disable optimistic spin after halting CPUs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]