Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
From: Peter Zijlstra
Date: Thu Sep 05 2019 - 10:31:38 EST
On Thu, Sep 05, 2019 at 03:05:13PM +0200, Petr Mladek wrote:
> The serialized approach used a lock. It was re-entrant and thus less
> error-prone but still a lock.
>
> The lock was planed to be used not only to access the buffer but also
> for eventual locking inside lockless consoles. It might allow to
> have some synchronization even in lockless consoles. But it
> would be big-kernel-lock-like style. It might create yet
> another maze of problems.
I really don't see your point. All it does is limit buffer writers to a
single CPU, and does the same for the atomic/early console output.
But it must very much be a leaf lock -- that is, there must not be any
locking inside it -- and that is fine, if a console cannot do lockless
output, it simply cannot be marked as having an atomic/early console.
You've seen the force_earlyprintk patches I use [*], that stuff works
and is infinitely better than the current printk trainwreck -- and it
uses exactly such serialization -- although I only added it to make the
output actually readable. And _that_ is exactly why I propose adding it,
you need it _anyway_.
So the argument goes like:
- synchronous output to lockless consoles (early serial) is mandatory
- such output needs to be CPU serialized, otherwise it becomes
unreadable garbage.
- since we need that serialization anyway, might as well lift it up one
layer an put it around the buffer.
Since a single-cpu buffer writer can be wait free (and relatively
simple), the only possible waiting is on the lockless console (polling
until the UART is ready for it's next byte). There is nothing else. It
will make progress.
> If we remove per-CPU buffers in NMI. We would need to synchronize
> again printing backtraces from all CPUs. Otherwise they would get
> mixed and hard to read. It might be solved by some prefix and
> sorting in userspace but...
It must have cpu prefixes anyway; the multi-writer thing will equally
mix them together. This is a complete non sequitur.
That current printk stuff is just pure and utter crap. Those NMI buffers
are a trainwreck and need to die a horrible death.
> I agree that this lockless variant is really complicated. I am not
> able to prove that it is race free as it is now. I understand
> the algorithm. But there are too many synchronization points.
>
> Peter, have you seen my alternative approach, please. See
> https://lore.kernel.org/lkml/20190704103321.10022-1-pmladek@xxxxxxxx/
>
> It uses two tricks:
>
> 1. Two bits in the sequence number are used to track the state
> of the related data. It allows to implement the entire
> life cycle of each entry using atomic operation on a single
> variable.
>
> 2. There is a helper function to read valid data for each entry,
> see prb_read_desc(). It checks the state before and after
> reading the data to make sure that they are valid. And
> it includes the needed read barriers. As a result there
> are only three explicit barriers in the code. All other
> are implicitly done by cmpxchg() atomic operations.
>
> The alternative lockless approach is still more complicated than
> the serialized one. But I think that it is manageable thanks to
> the simplified state tracking. And I might safe use some pain
> in the long term.
I've not looked at it yet, sorry. But per the above argument of needing
the CPU serialization _anyway_, I don't see a compelling reason not to
use it.
It is simple, it works. Let's use it.
If you really fancy a multi-writer buffer, you can always switch to one
later, if you can convince someone it actually brings benefits and not
just head-aches.
So I have something roughly like the below; I'm suggesting you add the
line with + on:
int early_vprintk(const char *fmt, va_list args)
{
char buf[256]; // teh suck!
int old, n = vscnprintf(buf, sizeof(buf), fmt, args);
old = cpu_lock();
+ printk_buffer_store(buf, n);
early_console->write(early_console, buf, n);
cpu_unlock(old);
return n;
}
(yes, yes, we can get rid of the on-stack @buf thing with a
reserve+commit API, but who cares :-))
[*] git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/experimental