Re: [PATCH v6 3/4] printk: Userspace format indexing support

From: Chris Down
Date: Tue Jun 01 2021 - 11:16:04 EST


Hey Petr,

Thanks for the feedback -- apologies for the delay in getting back.

Anything not explicitly replied to here is acked and will be fixed for v7.

Petr Mladek writes:
Are you aware of any subsystem having some extra post_fmt, please?

Good thinking -- I just checked and I didn't find any, so I guess we can add that later if needed and do without it for now. :-)

Well, the main question is whether we need to store this
at all. Does it bring any useful information?

Note that we do not store the value defined by pr_fmt() which
is yet another automatically added prefix.

Hmm, we do actually store pr_fmt, since it's known at compile time (or we wouldn't store it at all if it was dynamic due to the __builtin_constant_p check). Or are you seeing something different?

I would personally omit these prefixes. The most important
information is:

+ fmt: is the pattern that the system monitors would most likely
look for.

+ level: says whether the string will appear on console that
shows only messages below a certain console_loglevel

+ func, file, line: help to find the string in the kernel sources.
It is useful when investigating what has changed.

For example, pre_fmt="%s %s:" used by dev_printk (4th patch)
is not much useful.

In this case it's not super useful, but for cases where you can put something more useful (like "BTRFS: ...") it can make matching significantly less ambiguous.

Having this information means that system monitors know that they can anchor the matching at both ends when linking a printk format with some output, rather than having to look for a substring (in which case things become more difficult to determine).

I agree that in this particular case with dev_printk it's not incredibly useful, but there are certainly others we may want to have in future (like the btrfs case) where it would certainly help disambiguate. :-)


+} __packed;
+
+#define __printk_index_emit(_fmt, _level, _pre_fmt, _post_fmt) \
+ ({ \
+ if (__builtin_constant_p(_fmt) && __builtin_constant_p(_level)) { \
+ /*
+ * The compiler may not be able to eliminate the
+ * non-constant variants of _fmt and _level, so we need
+ * to make sure that it doesn't see any hypothetical
+ * assignment for non-constants even though this is
+ * already inside the __builtin_constant_p guard.
+ */ \
+ static const struct pi_entry _entry \
+ __used = { \
+ .fmt = __builtin_constant_p(_fmt) ? (_fmt) : NULL, \
+ .func = __func__, \
+ .file = __FILE__, \
+ .line = __LINE__, \
+ .level = __builtin_constant_p(_level) ? (_level) : NULL, \
+ .pre_fmt = _pre_fmt, \

Should this also be?

.pre_fmt = __builtin_constant_p(_pre_fmt) ? _pre_fmt : NULL


Hmm, if pre_fmt for a subsystem isn't known at compile time, it's an error on the developer's end -- otherwise things can't be stored anyway and there's no point.

+ * pre and post must be known at compile time, or compilation will fail (since
+ * this is a mistake). If fmt or level is not known at compile time, no index
+ * entry will be made (since this can legitimately happen).

How is this achieved, please?

__printk_index_emit() creates the entry when the following check is true:

if (__builtin_constant_p(_fmt) && __builtin_constant_p(_level))

It checks neither _pre_fmt nor _post_fmt.

Maybe my phrasing wasn't clear, but compilation would fail in that case because `.pre_fmt = _pre_fmt` will be a non-constant static initialiser, no?

+ if (!entry->fmt)
+ return 0;

Is this just a paranoid check or is it a valid case?

It happens only for the following printk:

arch/x86/mm/fault.c:534 show_fault_oops "BUG: kernel NULL pointer dereference, address: %px\n"

I must confess, I'm not entirely sure why, although it's ironic/interesting that it happens on _the_ NULL-related deref printk...

Is is possible to update __printk_index_emit() to do not create
entries with fmt = NULL ?

We should either remove the above check or add a comment
explaining why it is necessary.

The trick is unfortunately necessary for the reason described in the comment above the double check:

/*
* The compiler may not be able to eliminate the
* non-constant variants of _fmt and _level, so we need
* to make sure that it doesn't see any hypothetical
* assignment for non-constants even though this is
* already inside the __builtin_constant_p guard.
*/

Happy to add another comment if it helps.


+
+ if (entry->level)
+ printk_parse_prefix(entry->level, &level, &flags);
+ else
+ prefix_len = printk_parse_prefix(entry->fmt, &level, &flags);

This is missing:

if (level == LOGLEVEL_DEFAULT)
level = default_message_loglevel;

Without it, it produces lines with loglevel <-1>, for example:

<-1> init/do_mounts.c:457 mount_block_root "\n"
<-1> init/do_mounts.c:456 mount_block_root " %s"
<-1> init/do_mounts.c:454 mount_block_root "No filesystem could mount root, tried: "

Hmm, from the perspective of someone building this printk-checking infrastructure, I want -1 here instead of dynamically changing the output to default_message_loglevel, because default_message_loglevel may not be the same across systems. For example, having "-1" allows people to know these will come out at whatever the default message loglevel is, rather than necessarily being always at (say) level 4 or whatever it happens to be set at on the running system.

Without that, it's also not possible to compare entries across systems or across time, because the default loglevel may dynamically change at runtime, and it's not possible to disambiguate "will always be KERN_X" from "happens to be KERN_X" on this system.

And if someone really does want to know what value it will be right now, they can just look at /proc/sys/kernel/printk and change it out as desired.

+
+ seq_printf(s, "<%d%s> %s:%d %s \"",
+ level, flags & LOG_CONT ? ",c" : "", entry->file,
+ entry->line, entry->func);

It produces the following for continuous lines:

<-1,c> arch/x86/events/core.c:2101 init_hw_perf_events "%s PMU driver.\n"
<-1,c> arch/x86/events/core.c:2091 init_hw_perf_events "no PMU driver, software events only.\n"

But we should print the loglevel only when explicitly defined.
So I would do it the following way:

if (flags & LOG_CONT) {
if (level == LOGLEVEL_DEFAULT)
seq_printf(s, "<c>");
else
seq_printf(s, "<%d,c>", level);
} else {
if (level == LOGLEVEL_DEFAULT)
level = default_message_loglevel;
seq_printf(s, "<%d>", level);
}

seq_printf(s, " %s:%d %s \"", entry->file, entry->line, entry->func);

Ah yes, that's a decent point, since we don't treat this as meaning LOGLEVEL_DEFAULT. I'll have a think about how to make it better for v7.

I am really happy with the progress. The remaining problems are mostly
with the support for the subsystem-specific printk() callers that was
added in this revision.

That's great! I really appreciate your time and your diligent review, you've definitely helped improve this patchset for the better :-)

Thanks,

Chris