Re: [RFC PATCH v4 9/9] printk: use a new ringbuffer implementation

From: Linus Torvalds
Date: Thu Aug 08 2019 - 19:33:44 EST


On Thu, Aug 8, 2019 at 3:56 PM John Ogness <john.ogness@xxxxxxxxxxxxx> wrote:
>
> On a side note, I'm not sure sure if we want kernel code to do the ASCII
> dump of the raw buffer. A userspace application grabbing from /dev/mem
> might be more desirable.

So there are two major issues here.

One obvious one is that the *current* boot must not overwrite the previous data.

With the separate stupid ring buffer, that isn't an issue. We just
won't start logging into it at all until after it's been registered,
so the solution is "read it out before registering it again" - and
that can indeed be done at almost any point (including by a user space
thing).

But with your suggestion of just using the native ring buffer you
already have, that's not an option. The current boot will have started
using it already and is overwriting the data we wanted from the
previous boot.

To which the obvious solution is "just use a different buffer for the
next boot". But that brings up the *second* big issue with a
reboot-safe buffer: it can't just be anywhere. Not only do you have to
have some way to find the old one, the actual location may end up
being basically fixed by hardware or firmware.

The reason for that is that while the patch I sent out actually works
fine on lots of machines in practice, it has a serious issue: it only
works for a nice warm reset, and only when the BIOS doesn't overwrite
memory at reset. Both of those are problems.

Most BIOSes don't overwrite all memory for a warm reset, simply
because booting fast is an issue. But some BIOSes _do_ re-initialize
everything. And in particular, they tend to do so after cold boots,
particularly on server machines where you may have "oh, ECC needs to
be re-initialized".

End result: in reality, my hacky patch is a "look, if we had firmware
support for not re-initializing a small piece of RAM, we could use
this". I've asked Intel for a fast logging facility in hardware for
basically two decades by now, and it's not happening. But I'm hoping
that I _can_ ask them for "hey, can you make your firmware not reinit
a small 64kB buffer at boot, so that it will survive even a cold boot
as long as the power-off was very short".

(And the "64kB" above is just a random number. Maybe it's 4kB.
Something fairly small, and something that the BIOS would not reinit,
and in the presense of ECC it would possibly just scrub - read old
contents without ECC, write them back with ECC).

See? That basically fixes the reboot-safe buffer at a particular
memory area, and means that you can't share the same buffer for the
basic logging as for the reboot-safe case..

That's why I also reacted to your POWER NVDIMM thing - it actually has
some of the same issues. It's fixed by external forces, and not a
generic random piece of memory.

Linus