Re: [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe

From: Sergey Senozhatsky
Date: Wed Jul 17 2019 - 05:56:24 EST


On (07/16/19 09:28), Petr Mladek wrote:
> Kernel tries hard to store and show printk messages when panicking. Even
> logbuf_lock gets re-initialized when only one CPU is running after
> smp_send_stop().
>
> Unfortunately, smp_send_stop() might fail on architectures that do not
> use NMI as a fallback. Then printk log buffer might stay locked and
> a deadlock is almost inevitable.

I'd say that deadlock is still almost inevitable.

panic-CPU syncs with the printing-CPU before it attempts to SMP_STOP.
If there is an active printing-CPU, which is looping in console_unlock(),
taking logbuf_lock in order to msg_print_text() and stuff, then panic-CPU
will spin on console_owner waiting for that printing-CPU to handover
printing duties.

pr_emerg("Kernel panic - not syncing");
smp_send_stop();


If printing-CPU goes nuts under logbuf_lock, has corrupted IDT or anything
else, then we will not progress with panic(). panic-CPU will deadlock. If
not on
pr_emerg("Kernel panic - not syncing")

then on another pr_emerg(), right before the NMI-fallback.

static void native_stop_other_cpus()
{
...
pr_emerg("Shutting down cpus with NMI\n");
^^ deadlock here
apic->send_IPI_allbutself(NMI_VECTOR);
^^ not going to happen
...
}

And it's not only x86. In many cases if we fail to SMP_STOP other
CPUs, and one of hem is holding logbuf_lock then we are done with
panic(). We will not return from smp_send_stop().

arm/kernel/smp.c

void smp_send_stop(void)
{
...
if (num_online_cpus() > 1)
pr_warn("SMP: failed to stop secondary CPUs\n");
}

arm64/kernel/smp.c

void crash_smp_send_stop(void)
{
...
pr_crit("SMP: stopping secondary CPUs\n");
smp_cross_call(&mask, IPI_CPU_CRASH_STOP);

...
if (atomic_read(&waiting_for_crash_ipi) > 0)
pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
cpumask_pr_args(&mask));
...
}

arm64/kernel/smp.c

void smp_send_stop(void)
{
...
if (num_online_cpus() > 1)
pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
cpumask_pr_args(cpu_online_mask));
...
}


riscv/kernel/smp.c

void smp_send_stop(void)
{
...
if (num_online_cpus() > 1)
pr_warn("SMP: failed to stop secondary CPUs %*pbl\n",
cpumask_pr_args(cpu_online_mask));
...
}

And so on.

-ss