Re: [RFC][PATCH] printk: Fixup the nmi printk mess

From: Petr Mladek
Date: Thu Jun 11 2015 - 10:56:00 EST


On Wed 2015-06-10 21:23:04, Peter Zijlstra wrote:
> Below a version which does x-cpu stuff to allow the
> trigger_all*_cpu_backtrace() initiator to flush buffers on behalf of
> other CPUs.
>
> Compile tested only.

The output from "echo l >/proc/sysrq-trigger" looks reasonable.
It does not mix output from different CPUs. This is expected
because of the @lock.

The other observation is that it prints CPUs in _random_ order:
28, 24, 25, 1, 26, 2, 27, 3, ...

The order is fine when I disable the irq_work.

It means that irq_works are usually faster than printk_nmi_flush() =>
printk_nmi_flush() is not that useful => all the complexity with
the three atomic variables (head, tail, read) did not bring
much win.

Anyway, I think that the current solution is racy and it cannot be fixed
easily, see below.


> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c099b082cd02..99bfc1e3f32a 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1821,13 +1821,200 @@ int vprintk_default(const char *fmt, va_list args)
> +static void __printk_nmi_flush(struct irq_work *work)
> +{
> + static raw_spinlock_t lock = __RAW_SPIN_LOCK_INITIALIZER(lock);
> + struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work);
> + int len, head, size, i, last_i;
> +
> +again:
> + /*
> + * vprintk_nmi() truncate
> + *
> + * [S] head [S] head
> + * wmb mb
> + * [S] tail [S] read

BTW, this is quite cryptic for me. Coffee did not help :-)

*
> + * therefore:
> + */
> + i = atomic_read(&s->read);
> + len = atomic_read(&s->tail); /* up to the tail is stable */
> + smp_rmb();
> + head = atomic_read(&s->head);
> +
> + /*
> + * We cannot truncate tail because it could overwrite a store from
> + * vprintk_nmi(), however vprintk_nmi() will always update tail to the
> + * correct value.
> + *
> + * Therefore if head < tail, we missed a truncate and should do so now.
> + */
> + if (head < len)
> + len = 0;

This is a bit confusing. It is a complicated way how to return on the next test.

If I get this correctly. This might happen only inside
_printk_nmi_flush() called on another CPU (from
printk_nmi_flush()) when it interferes with the queued
irq_work. The irq_work is faster and truncates the buffer.

So, the return is fine after all because the irq_work printed
everything.


> + if (len - i <= 0) /* nothing to do */
> + return;
> + /*
> + * 'Consume' this chunk, avoids concurrent callers printing the same
> + * stuff.
> + */
> + if (atomic_cmpxchg(&s->read, i, len) != i)
> + goto again;

I think that this is racy:

CPU0 CPU7

printk_nmi_flush()

__printk_nmi_flush(for CPU7)

i = atomic_read(&s->read); (100)
len = atomic_read(&s->tail); (200)
head = atomic_read(&s->head); (200)

if (atomic_cmpxchg(&s->read, i, len) != i)

we pass but we get interrupted
or rescheduled on preemptive kernel

another vprintk_nmi()
leaves: head(400), tail(400)

__printk_nmi_flush() in irq_work

it prints string between 200-400
truncate buffer: head(0), read(0)

another vprintk_nmi()
returns: head(150), tail(150)

print string between (100-200) =>
part of the new and part of old message
and modifies @head and @read a wrong way

I think that such races are hard to avoid without indexing the printed
messages. But it would make the approach too complicated.

I think that ordering CPUs is not worth it. I would go back to the
first solution, add the @lock there, and double check races with
seq_buf().

I stop here with commenting the code for now.

Best Regards,
Petr

PS: I had two cups of coffee and hope that my comments are smaller fiasco
than yesterday.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/