Re: recursion handling: Re: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock
From: Petr Mladek
Date: Mon Dec 07 2020 - 06:18:25 EST
On Sun 2020-12-06 22:50:54, John Ogness wrote:
> On 2020-12-04, Petr Mladek <pmladek@xxxxxxxx> wrote:
> > On Tue 2020-12-01 21:59:41, John Ogness wrote:
> >> Since the ringbuffer is lockless, there is no need for it to be
> >> protected by @logbuf_lock. Remove @logbuf_lock.
> >>
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> > What is the exact reason to disable interrupts around the entire
> > vprintk_store(), please? It should get documented.
>
> It simplifies the context tracking. Also, in mainline interrupts are
> already disabled for all of vprintk_store(). AFAIK latencies due to
> logbuf_lock contention were not an issue.
>
> I really don't want to touch task_struct. IMHO the usefulness of that
> struct is limited, considering that printk can be called from scheduling
> and interrupting contexts.
Fair enough. I am fine with the per-CPU variables and the disabled
interrupts around vprintk_store().
Note: We should also prevent calling console_trylock() for recursive
messages to avoid infinite loop or even deadlock in this part of the code.
> >> +
> >> +struct printk_recursion {
> >> + char count[NUM_RECURSION_CTX];
> >> +};
> >>
> >> +static DEFINE_PER_CPU(struct printk_recursion, percpu_printk_recursion);
> >> +static char printk_recursion_count[NUM_RECURSION_CTX];
> >
> > This is pretty confusing. The array is hidden in a struct when per-cpu
> > variables are used. And a naked array is used for early boot.
> >
> > Is the structure really needed? What about?
> >
> > static DEFINE_PER_CPU(char [PRINTK_CTX_NUM], printk_count);
> > static char printk_count_early[NUM_RECURSION_CTX];
>
> OK.
>
> >> +
> >> +static char *get_printk_count(void)
> >> +{
> >> + struct printk_recursion *rec;
> >> + char *count;
> >> +
> >> + if (!printk_percpu_data_ready()) {
> >> + count = &printk_recursion_count[0];
> >
> > I see why you avoided per-cpu variables for early boot. I am just
> > curious how printk_context variable works these days. It is used by
> > any printk(), including early code, see vprintk_func().
>
> IMO printk_context is serving a different purpose. With the existance of
> logbuf_lock, printk_context exists for the sole purpose of making sure
> logbuf_lock is not taken recursively or that the CPU does not spin on it
> in NMI context. printk_context is simply gating calls to the safe
> buffers.
Well, both printk_count and printk_context are able to count recursion
in different context. They both are used to decide how printk() will
behave...
Anyway, it is not necessary to fight over words. You write below that
the plan is to remove printk_safe, including printk_context. It will
solve my problem.
I am fine with having both of them for some transition period. I guess
that it will make our life easier, from the coding and review point
of view.
> >> +static bool printk_enter(unsigned long *flags)
> >> +{
> >> + char *count;
> >> +
> >> + local_irq_save(*flags);
> >> + count = get_printk_count();
> >> + /* Only 1 level of recursion allowed. */
> >
> > We should allow at least some level of recursion. Otherwise, we would
> > not see warnings printed from vsprintf code.
>
> With 1 level, you will see warnings from vsprintf code. I'm not sure it
> makes sense to allow more than 1 level. It causes exponential logging.
Shame on me. I have missed that 1 level was actually enabled.
Anyway, I would like to allow 2 level recursion at least. However, for
example, 5 level recursion, would be even better.
We need to know when there is problem to store the recursive/nested
message. And I would rather see the same message repeated 3 times than
to do not see it at all.
Note that the risk of infinite recursion is pretty low. We have most of
the code called from vprintk_emit() under control. There are many
pr_once() or WARN_ONCE(). The error messages have rather simple
and commonly used formatting, so the risk of recursive errors in
vsprintf() code is low.
> > Or is there any plan to remove printk_safe and printk_context?
I am feeling relief to read this.
Do not take me wrong. printk_safe() was really great idea and served
well its purpose. But it is yet another tricky lockless code. There
is another complexity with flushing the temporary buffers and handling
panic(). It is nice that we could remove some of this complexity.
Best Regards,
Petr