Re: [PATCH v3 0/6] allow ramoops to collect all kmesg_dump events

From: Kees Cook
Date: Tue May 12 2020 - 14:45:59 EST


On Tue, May 12, 2020 at 05:52:07PM +0200, Petr Mladek wrote:
> On Tue 2020-05-12 10:03:44, Pavel Tatashin wrote:
> > > OK, I personally see this as two separate problems:
> > >
> > > 1. Missing support to set loglevel per console.
> > > 2. Missing support to dump messages for other reasons.
> > >
> > > I would remove the paragraph about console log levels completely.
> >
> > OK, I see your point, this paragraph can be removed, however, I think
> > it makes it clear to understand the rationale for this change. As I
> > understand, the per console loglevel has been proposed but were never
> > accepted.

I understood Pavel's rationale as an output from my questions in the v1
series, that went like this, paraphrased:

Pavel: "I need to have other kmsg dump reasons available to pstore."
Kees: "Why can't you just use the pstore console dumper?"
Pavel: "It's too much for the slow device; we only need to know about
specific events that are already provided by kmsg dump."
Kees: "Ah! Sounds good, max_reasons it is."

So, AIUI, loglevel remains orthogonal to this, and it's my fault for
even causing to be be brought up. Please disregard! :)

> > printk.always_kmsg_dump is not working for me because ramoops has its
> > own filtering based on dump_oops boolean, and ignores everything but
> > panics and conditionally oops.
> > max_reason makes the ramoops internal logic cleaner compared to using dump_oops.
>
> I see. Just to be sure. Is the main reason to add max_reason parameter
> to keep complatibility of the deprecated dump_oops parameter? Or is
> there any real use case for this granularity?

In my mind it seemed like a nice mapping, so it was an easy port.

> I wonder if anyone is actually using the ramoops.dump_oops parameter
> in reality. I would personally make it deprecated and change the
> default behavior to work according to printk.always_kmsg_dump parameter.

Yes. For things I'm aware of: ARM devices with very tiny persistent RAM
were using ramoops and setting dump_oops to 0 (specifically, setting
the DT "no-dump-oops" to 1), and larger Android and Chrome OS devices
using ramoops were setting to dump_oops to 1[1].

The logic built into pstore recognizes a difference between panic and
non-panic dumps as well, as the expectation is that there is little to
no kernel infrastructure available for use during a panic kmsg.

> IMHO, ramoops.dump_oops just increases complexity and should not have
> been introduced at all. I would try hard to avoid introducing even bigger
> complecity and mess.

I think dump_oops was the wrong implementation, but granularity control
is still needed. It is an old parameter, and is baked into many device
trees on systems, so I can't just drop it. (In fact, I've had to support
some other DT compat issues[2] as well.)

> I know that there is the "do not break existing userspace" rule. The
> question is if there is any user and if it is worth it.

For dump_oops, yes, there is unfortunately.

> > I agree, the reasons in kmsg_dump_reason do not order well (I
> > actually want to add another reason for kexec type reboots, and where
> > do I put it?), so how about if we change the ordering list to
> > bitfield/flags, and instead of max_reason provide: "reasons" bitset?
>
> It looks too complicated. I would really try hard to avoid the
> parameter at all.

Here are the problems I see being solved by this:

- lifting kmsg dump reason filtering out of the individual pstore
backends and making it part of the "infrastructure", so that
there is a central place to set expectations. Right now there
is a mix of explicit and implicit kmsg dump handling:

- arch/powerpc/kernel/nvram_64.c has a hard-coded list
- drivers/firmware/efi/efi-pstore.c doesn't expect anything but
OOPS and PANIC.
- drivers/mtd/mtdoops.c tries to filter using its own dump_oops
and doesn't expect anything but OOPS and PANIC.
- fs/pstore/ram.c: has a hard-coded list and uses its own
dump_oops.
- drivers/mtd/mtdpstore.c (under development[3]) expected only
OOPS and PANIC and had its own dump_oops.

- providing a way for backends that can deal with all kmsg dump reasons
to do so without breaking existing default behavior (i.e. getting
Pavel what he's interested in).

So, that said, I'm totally fine with a bit field. I just need a way to
map the kmsg dump reasons onto the existing backend expectations and to
have Pavel's needs addressed.

-Kees

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/platform/chrome/chromeos_pstore.c?h=v5.6#n60
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/pstore/ram.c?h=v5.6#n708
[3] https://lore.kernel.org/lkml/20200511233229.27745-11-keescook@xxxxxxxxxxxx/

--
Kees Cook