Re: [RFC][PATCHv2 3/7] printk: introduce per-cpu alt_print seq buffer

From: Petr Mladek
Date: Thu Oct 06 2016 - 10:57:26 EST


On Sat 2016-10-01 11:24:15, Sergey Senozhatsky wrote:
> On (10/01/16 00:17), Sergey Senozhatsky wrote:
> [..]
> > +void alt_printk_enter(void)
> > +{
> > + unsigned long flags;
> -
> > + local_irq_save(flags);
> > + if (!(this_cpu_read(alt_printk_ctx) & ALT_PRINTK_CONTEXT_MASK))
> > + this_cpu_write(alt_printk_irq_flags, flags);
> > + this_cpu_inc(alt_printk_ctx);
> > +}
> [..]
> > +void alt_printk_exit(void)
> > +{
> > + this_cpu_dec(alt_printk_ctx);
> > + if (!(this_cpu_read(alt_printk_ctx) & ALT_PRINTK_CONTEXT_MASK))
> > + local_irq_restore(this_cpu_read(alt_printk_irq_flags));
> > +}
>
> a few words why did I make it so complex here. it may look that nested
> alt_printk_enter()-s are not really possible; but it's not exactly the
> case. and the reasons being is that printk exports too many symbols.
> basically, the only 'safe' printk variant at the moment is
>
> printk()
> vprintk_func()
>
> because it 'respects' current printk context... and that's it.
> the reset of them will call vprintk_emit() regardless the context.
> this patch set fixes (in 0001) vprintk()
>
> vprintk()
> vprintk_func()

This is fixed by the 1st patch in this patchset.

> but we still have
>
> EXPORT_SYMBOL(vprintk_emit)
> EXPORT_SYMBOL(printk_emit)
> EXPORT_SYMBOL_GPL(vprintk_default)

It is a pity. Anyway, there should either be a big fat warning
in a comment about these functions. Or we should check/count
misuses and print a warning later, like we do with lost NMI messages.

Of course, we should revisit the exported status of these functions
and if they might get redirected to a save implementation. But
if anyone wants to shoot herself into her foot, we might hardly
prevent it.


> any of those called from alt_printk_log_store() will deadlock us.
>
> printk()
> vprintk_emit()
> alt_printk_enter()
> raw_spin_lock(&logbuf_lock)
> ...
> printk() << first recursion
> vprintk_alt()
> alt_printk_log_store()
> vprintk_default()
> ---> vprintk_emit() << second recursion
> raw_spin_lock(&logbuf_lock) << deadlock
> ...
> alt_printk_exit()
>
>
> our vprintk_alt()->alt_printk_log_store() seems to be *more or less*
> safe, for the time being.

I would solve this by a check and warning. People, should know
what they do when using vprintk_default()/emit() directly. If they
use it a wrong way, it is a bug that should get fixed.

As you suggest, alt_printk_log_store() is rather simple functions.
There is not much space for such an adventurous code.


> there are many WARN_ON_* on
> vprintk_alt()->alt_printk_log_store()->vsnprintf() path but they all
> seem to be calling printk():
>
> - WARN_ON_ONCE() from vsnprintf()
> - WARN_ONCE() from vsnprintf()->format_decode()
> - WARN_ON vsnprintf()->set_field_width()
> - WARN_ON from vsnprintf()->set_precision()
> - WARN_ON from vsnprintf()->pointer()->flags_string()
>
> a side note, some of these WARNs are... 'funny'. e.g., to deadlock a
> system it's enough to just pass an unsupported flag in format string.
> vsnprintf() will
> WARN_ONCE(1, "Please remove unsupported %%%c in format string\n", *fmt)
>
> but the problem is that we are already in printk():
>
> printk()
> raw_spin_lock(&logbuf_lock)
> text_len = vscnprintf(text, sizeof(textbuf), fmt, args)
> WARN_ONCE(1, "Please remove unsupported ...)
> printk()
> raw_spin_lock(&logbuf_lock) << deadlock

Just for record. This vscnprintf() is called when logbuf_cpu is set.
Therefore this particular recursion is not possible at the moment.

Anyway, I agree that alt_printk_enter() calls might get nested.
The direct vprintk_emit() calls are one reason. I guess that
we will need to use the same counter/enter functions also
for WARN_*DEFERRED(). And it would cause nesting as well.

Therefore we need to be careful about loosing the original
value of irq flags.

The question is whether we need to store the flags in
a per-CPU variable. We might also store it on the stack
of the enter()/exit() function caller. I mean something like


void alt_printk_enter(unsigned long *flags)
{
preempt_disable();
if (!this_cpu_read(alt_printk_ctx) & ALT_PRINTK_CONTEXT_MASK)
local_irq_save(*flags);
preempt_enable();

this_cpu_inc(alt_printk_ctx);
}

void alt_printk_exit(unsigned long *flags)
{
this_cpu_dec(alt_printk_ctx);

if (!(this_cpu_read(alt_printk_ctx) & ALT_PRINTK_CONTEXT_MASK))
local_irq_restore(*flags);
}

I am not sure if it is less hairy. But it has some "advantages":

+ saves a per-CPU variable
+ symmetric number of irq_save()/restore() calls
+ the side effect with the irq disabling is more evident

Well, your variant might be less error prone.

Best Regards,
Petr


PS: I am sorry for writing so long answer. It took me some
time to understand why the code should be like this. The
answers helped me to get the picture. I hope that they are
somehow useful for others as well.