Re: [PATCH v3 0/6] allow ramoops to collect all kmesg_dump events

From: Pavel Tatashin
Date: Tue May 12 2020 - 12:49:49 EST


On Tue, May 12, 2020 at 11:52 AM Petr Mladek <pmladek@xxxxxxxx> wrote:
>
> On Tue 2020-05-12 10:03:44, Pavel Tatashin wrote:
> > > OK, I personally see this as two separate problems:
> > >
> > > 1. Missing support to set loglevel per console.
> > > 2. Missing support to dump messages for other reasons.
> > >
> > > I would remove the paragraph about console log levels completely.
> >
> > OK, I see your point, this paragraph can be removed, however, I think
> > it makes it clear to understand the rationale for this change. As I
> > understand, the per console loglevel has been proposed but were never
> > accepted.
>
> The proposal was not accepted because there were more requirements:
>
> + add console device into sysfs so that it can be modified there
> + make a reasonable backward compatible behavior
>
> I guess that the sysfs interface discouraged the author to continue
> on it.
>
> Note that console loglevel handling is very complicated. There are
> already four variables, see console_printk array in
> kernel/printk/printk.c. Per console loglevel would make it even
> more complicated.
>
> It is a nighmare. And introducing max_reason parameter goes the same way.
>
> > > Now, the max_reason logic makes sense only when all the values
> > > have some ordering. Is this the case?
> > >
> > > I see it as two distinct sets:
> > >
> > > + panic, oops, emerg: describe how critical is an error situation
> > > + restart, halt, poweroff: describe behavior when the system goes down
> > >
> > > Let's say that panic is more critical than oops. Is restart more
> > > critical than halt?
> > >
> > > If you want the dump during restart. Does it mean that you want it
> > > also during emergency situation?
> > >
> > > My fear is that this patchset is going to introduce user interface
> > > (max_reason) with a weird logic. IMHO, max_reason is confusing even
> > > in the code and we should not spread this to users.
> > >
> > > Is there any reason why the existing printk.always_kmsg_dump option
> > > is not enough for you?
> >
> > printk.always_kmsg_dump is not working for me because ramoops has its
> > own filtering based on dump_oops boolean, and ignores everything but
> > panics and conditionally oops.
> > max_reason makes the ramoops internal logic cleaner compared to using dump_oops.
>
> I see. Just to be sure. Is the main reason to add max_reason parameter
> to keep complatibility of the deprecated dump_oops parameter? Or is
> there any real use case for this granularity?
>
> I made some arecheology. ramoops.dump_oops parameter was added in 2010 by the
> initial commit 56d611a04fb2db77334e ("char drivers: RAM oops/panic
> logger."
>
> Note that the initial implementation printed Oops reason only when
> dump_oops was set. It printed all other reasons otherwise. It seems
> that there were only the two reasons at that time.
>
> Now, printk.always_kmsg_dump parameter was added later in 2012 by
> the commit c22ab332902333f8376601 ("kmsg_dump: don't run on non-error
> paths by default").
>
> IMHO, the later commit actually fixed the default behavior of ramoops.
>
> I wonder if anyone is actually using the ramoops.dump_oops parameter
> in reality. I would personally make it deprecated and change the
> default behavior to work according to printk.always_kmsg_dump parameter.

This sounds alright to me with one slight problem. I am doing this
work for an embedded arm64 SoC, so controlling everything via device
tree is preferable compared to having some settings via device tree
and others via kernel parameters, especially because the kernel
parameters are hardcoded by firmware that we try not to update too
often for uptime reasons.

>
> IMHO, ramoops.dump_oops just increases complexity and should not have
> been introduced at all. I would try hard to avoid introducing even bigger
> complecity and mess.

I agree, amoops.dump_oops should be depricated with or without
max_reason change.

>
> I know that there is the "do not break existing userspace" rule. The
> question is if there is any user and if it is worth it.
>
> > I agree, the reasons in kmsg_dump_reason do not order well (I
> > actually want to add another reason for kexec type reboots, and where
> > do I put it?), so how about if we change the ordering list to
> > bitfield/flags, and instead of max_reason provide: "reasons" bitset?
>
> It looks too complicated. I would really try hard to avoid the
> parameter at all.

OK. Should we remove max_reason from struct kmsg_dumper and also
remove the misleading comment about kmsg_dump_reason ordering?

Thank you,
Pasha