Re: [PATCH v2] arm64: kexec: flush log to console in NMI context

From: Huang Shijie
Date: Tue Apr 26 2022 - 05:34:33 EST

Hi Petr,
On Tue, Apr 26, 2022 at 10:19:02AM +0200, Petr Mladek wrote:
> On Sun 2022-04-24 15:19:52, Huang Shijie wrote:
> > If kdump is configured, nmi_panic() may run to machine_kexec().
> >
> > In NMI context, the defer_console_output() defers the console
> > output by using wake_up_klogd to flush the printk ringbuffer
> > to console.
> >
> > But in the machine_kexec, the system will reset, and there is
> > no chance for the wake_up_klogd to do its job. So we can _not_
> > see any log on the console since the nmi_panic
> > (nmi_panic() will disable the irq).
> >
> > This patch fixes this issue by using console_flush_on_panic()
> > to flush to console.
> >
> > After this patch, we can see all the log since the nmi_panic
> > in the panic console.
> This is not a good idea. The crashdump is the best source of
> information about the crashed system. It includes the complete
> log.

Sometimes, we cannot get the crashdump file, so any log is important
to us.

> The system is in unknown state during panic(). Any operation
> might break. Flushing consoles increases the risk that
> the crashdump will not get generated. The crashdump is more
> important. If the crashdump succeeds than the consoles are
> not needed.
> Note that printk() does not handle consoles in NMI because it might
> cause deadlock. console_flush_on_panic() tries to avoid deadlock
> caused by console_sem. Also the particular console drivers are
> more careful because oops_in_progress is set at this stage.
> But there is still a risk of the deadlock. There might be another
> locks that are do not check oops_in_progress. Also a potential
> double unlock might cause deadlock.
okay, thanks for the detail explanations.

> IMHO, the main motivation for this patch was to flush the per-CPU
> printk buffers (v1). But it is not longer needed. The buffers
> were removed in 5.15-rc1, see the commit 93d102f094be9beab28e
> ("printk: remove safe buffers").
> The only reason to call console drivers when crashdump is generated
> is to debug the kexec code path. But I am not sure if
> console_flush_on_panic() would help here. The kexec might fail
> anytime before or after this flush so that the important
> messages will not be visible anyway. John Ogness is going
> to add atomic serial console that might be better for this
> use case.
I hope it is ready as soon as possible..

Huang Shijie