Re: [PATCH] panic: disable optimistic spin after halting CPUs
From: Stephen Brennan
Date: Tue Jan 18 2022 - 18:14:44 EST
Petr Mladek <pmladek@xxxxxxxx> writes:
> On Thu 2022-01-13 11:36:35, Stephen Brennan wrote:
>> Petr Mladek <pmladek@xxxxxxxx> writes:
>> > On Wed 2022-01-12 16:54:25, Stephen Brennan wrote:
[snip]
>> > I suggest to disable the spinning when panic is in progress. I mean
>> > something like:
>> >
>> > --- a/kernel/printk/printk.c
>> > +++ b/kernel/printk/printk.c
>> > @@ -1842,6 +1842,10 @@ static int console_trylock_spinning(void)
>> > if (console_trylock())
>> > return 1;
>> >
>> > + /* Spinning is not safe when the system is stopped */
>> > + if (read_atomic(&panic_cpu) != PANIC_CPU_INVALID)
>> > + return 0;
>> > +
>> > printk_safe_enter_irqsave(flags);
>> >
>> > raw_spin_lock(&console_owner_lock);
>>
>> I think this is pretty much equivalent to my fix, since it also results
>> in the panicking CPU failing to acquire console_sem during
>> console_trylock_spinning().
>
> Yes.
>
>> I do think this is better than my fix :) since it doesn't clutter up
>> panic() even more, nor introduce an additional function. It even avoids
>> resetting the console_owner_lock spinlock.
>
> Yes. I agree.
>
> It also looks slightly more straightforward to me. It might be matter
> of taste. It is just that I misunderstood the effect of your variant
> yesterday ;-)
>
>> I'd be happy to do this as a v2, if you'd prefer?
>
> Yes, please. Go for it.
Hi Petr,
Sorry for the delayed response due to the US holiday weekend.
I switched to your approach, and began running it through my test script
(from the commit message) on QEMU. However, 14% of the time (200/1409) I
found that the system fell into an interesting race condition / loop.
178 void panic(const char *fmt, ...)
179 {
...
187 /*
188 * Disable local interrupts. This will prevent panic_smp_self_stop
189 * from deadlocking the first cpu that invokes the panic, since
190 * there is nothing to prevent an interrupt handler (that runs
191 * after setting panic_cpu) from invoking panic() again.
192 */
193 local_irq_disable();
194 preempt_disable_notrace();
...
211 this_cpu = raw_smp_processor_id();
212 old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
213
214 if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
215 panic_smp_self_stop();
...
226 pr_emerg("Kernel panic - not syncing: %s\n", buf);
...
250 if (!_crash_kexec_post_notifiers) { // <- HALT CPUs
...
We disable IRQ and preemption at the beginning of panic(), then set
panic_cpu. This opens a window (between lines 212 and 250) where
printk() spinning is disabled. Then, we go ahead and we use printk()
during this window, at line 226.
If another CPU is adding to the log buffer at the right time (concurrent
with the pr_emerg in line 226), then they can successfully prevent the
panic CPU from making progress. Since spinning is disabled, the other
CPU in vprintk_emit() will never get the console_sem, and will just
leave its buffered contents in the log buffer. And if the other CPU can
add to the log buffer faster than the panic CPU can drain it... then the
panic CPU is stuck forever writing into the console. And I do mean
forever. If a watchdog triggers, it will find panic_cpu already set, and
so it won't be able to do anything!
Thus I'm now testing the following modification:
// in console_trylock_spinning()
if (atomic_read(&panic_cpu) == this_cpu)
return 0;
This would ensure that the panic CPU won't fall into the spin loop, but
also ensures that other CPUs can't flood the panic CPU with buffered
messages.
I'll test this method and ensure there are no hangs, then send the patch.
>
>> > It would be also great when the current owner releases console_lock
>> > so that the panicing CPU could take over it.
>> >
>> > I think about the following. Well, I am not sure if it would help
>> > in the real life because other CPUs are stopped quite early in panic().
>> >
>> > --- a/kernel/printk/printk.c
>> > +++ b/kernel/printk/printk.c
>> > @@ -2646,13 +2650,18 @@ void console_unlock(void)
>> >
>> > for (;;) {
>> > size_t ext_len = 0;
>> > - int handover;
>> > + int handover, pcpu;
>> > size_t len;
>> >
>> > skip:
>> > if (!prb_read_valid(prb, console_seq, &r))
>> > break;
>> >
>> > + /* Allow the panic_cpu to take over consoles a safe way. */
>> > + pcpu = read_atomic(&panic_cpu);
>> > + if (pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id())
>> > + break;
>> > +
>> > if (console_seq != r.info->seq) {
>> > console_dropped += r.info->seq - console_seq;
>> > console_seq = r.info->seq;
>> >
>>
>> I suppose this could help reduce the odds of a CPU getting interrupted
>> during call_console_drivers(), and it might reduce the odds of
>> console_sem being held by a halted CPU during panic().
>>
>> However, it occurs to me that there are two cases:
>>
>> 1) a kdump kernel is loaded. In this case, crash_smp_send_stop() is
>> used, which (at least on x86_64) sends NMIs. In this case,
>> __crash_kexec() will not return (barring any errors) and we won't ever
>> really need to get the console_sem.
>
> This is likely the more safe case. NMI will make sure that
> the current owner will not mess with the console drivers any longer.
>
>
>> 2) no kdump kernel is loaded. In this case, smp_send_stop() is used,
>> which (again on x86_64) sends regular IPIs. In this case, we know we can
>> at least avoid interrupting call_console_drivers(), since that executes
>> with local IRQ disabled.
>
> This is probably more dangerous case. Regular IPIs will not stop the
> current owner when it is running with IRQ disabled. It means
> the it could still manipulate consoles and race with
> console_flush_on_panic().
On x86_64, smp_send_stop() will send an IPI and try to wait a grace
period for CPUs to stop, and after that grace period, use NMI, so there
is thankfully some protection.
>
> Note that printk() can be called in IRQ disabled context. Also note
> that some architectures do not have NMI. They use normal IRQ even
> in the 1st case.
Yes, this is a very important consideration.
>
>
>> So I'm also unsure how much this would help in the real world. But it's
>> a small change and so it doesn't need to help much in order to be a net
>> positive :)
>
> I agree that it is a corner case. But I think that it is worth it.
>
> Well, we could do this change separately. I could send the patch
> for this part if you would prefer it. But feel free to send it
> yourself.
I'd be glad to write this patch, put it through my VM stress testing,
and then send it. I've already got it setup so it might as well get put
to good use :)
Thanks,
Stephen
>
> Best Regards,
> Petr