Re: [RFC PATCH 00/11] printk: safe printing in NMI context

From: Jan Kara
Date: Fri May 30 2014 - 04:13:46 EST


On Thu 29-05-14 02:09:11, Frederic Weisbecker wrote:
> On Thu, May 29, 2014 at 12:02:30AM +0200, Jiri Kosina wrote:
> > On Fri, 9 May 2014, Petr Mladek wrote:
> >
> > > printk() cannot be used safely in NMI context because it uses internal locks
> > > and thus could cause a deadlock. Unfortunately there are circumstances when
> > > calling printk from NMI is very useful. For example, all WARN.*(in_nmi())
> > > would be much more helpful if they didn't lockup the machine.
> > >
> > > Another example would be arch_trigger_all_cpu_backtrace for x86 which uses NMI
> > > to dump traces on all CPU (either triggered by sysrq+l or from RCU stall
> > > detector).
> >
> > I am rather surprised that this patchset hasn't received a single review
> > comment for 3 weeks.
> >
> > Let me point out that the issues Petr is talking about in the cover letter
> > are real -- we've actually seen the lockups triggered by RCU stall
> > detector trying to dump stacks on all CPUs, and hard-locking machine up
> > while doing so.
> >
> > So this really needs to be solved.
>
> The lack of review may be partly due to a not very appealing changestat on an
> old codebase that is already unpopular:
>
> Documentation/kernel-parameters.txt | 19 +-
> kernel/printk/printk.c | 1218 +++++++++++++++++++++++++----------
> 2 files changed, 878 insertions(+), 359 deletions(-)
>
>
> Your patches look clean and pretty nice actually. They must be seriously
> considered if we want to keep the current locked ring buffer design and
> extend it to multiple per context buffers. But I wonder if it's worth to
> continue that way with the printk ancient design.
>
> If it takes more than 1000 line changes (including 500 added) to make it
> finally work correctly with NMIs by working around its fundamental flaws,
> shouldn't we rather redesign it to use a lockless ring buffer like ftrace
> or perf ones?
I agree that lockless ringbuffer would be a more elegant solution but a
much more intrusive one and complex as well. Petr's patch set basically
leaves ordinary printk path intact to avoid concerns about regressions
there.

Given how difficult / time consuming is it to push any complex changes to
printk I'd push for fixing printk from NMI in this inelegant but relatively
non-contentious way and work on converting printk to lockless
implementation long term. But before spending huge amount of time on that
I'd like to get some wider concensus that this is really the way we want to
go - at least AKPM and Steven - something for discussion in the KS topic I'd
proposed I think [1].

Honza

[1]
http://lists.linuxfoundation.org/pipermail/ksummit-discuss/2014-May/000598.html
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/