Re: printk: what is going on with additional newlines?

From: Tetsuo Handa
Date: Sat Sep 02 2017 - 02:12:11 EST


Linus Torvalds wrote:
> On Tue, Aug 29, 2017 at 1:41 PM, Tetsuo Handa
> <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> >>
> >> A private buffer has none of those issues.
> >
> > Yes, I posted "[PATCH] printk: Add best-effort printk() buffering." at
> > http://lkml.kernel.org/r/1493560477-3016-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx .
>
> No, this is exactly what I *don't* want, because it takes over printk() itself.
>
> And that's problematic, because nesting happens for various reasons.
>
> For example, you try to handle that nesting with printk_context(), and
> nothing when an interrupt happens.
>
> But that is fundamentally broken.
>
> Just to give an example: what if an interrupt happens, it does this
> buffering thing, then it gets interrupted by *another* interrupt, and
> now the printk from that other interrupt gets incorrectly nested
> together with the first one, because your "printk_context()" gives
> them the same context?

My assumption was that

(1) task context can be preempted by soft IRQ context, hard IRQ context and NMI context.
(2) soft IRQ context can be preempted by hard IRQ context and NMI context.
(3) hard IRQ context can be preempted by NMI context.
(4) An kernel-oops event can interrupt task context, soft IRQ context, hard IRQ context
and NMI context, but the interrupted context can not continue execution of
vprintk_default() after returning from the kernel-oops event even if the
kernel-oops event occurred in schedulable context and panic_on_oops == 0.

and thus my "printk_context()" gives them different context.

But my assumption was wrong that

soft IRQ context can be preempted by different soft IRQ context
(e.g. SoftIRQ1 can be preempted by SoftIRQ2 while running
handler for SoftIRQ1, and SoftIRQ2 can be preempted by SoftIRQ3
while running handler for SoftIRQ2, and so on)

hard IRQ context can be preempted by different hard IRQ context
(e.g. HardIRQ1 can be preempted by HardIRQ2 while running
handler for HardIRQ1, and HardIRQ2 can be preempted by HardIRQ3
while running handler for HardIRQ2, and so on)

? Then, we need to recognize how many IRQs are nested.

I just tried to distinguish context using one "unsigned long" value
by embedding IRQ status into lower bits of "struct task_struct *".
I can change to distinguish context using multiple "unsigned long" values.

>
> And it really doesn't have to even be interrupts. Look at what happens
> if you take a page fault in kernel space. Same exact deal. Both are
> sleeping contexts.

Is merging messages from outside a page fault and inside a page fault
so serious? That would happen only if memory access which might cause
a page fault occurs between get_printk_buffer() and put_printk_buffer(),
and I think that such user is rare.

>
> So I really think that the only thing that knows what the "context" is
> is the person who does the printing. So if you want to create a
> printing buffer, it should be explicit. You allocate the buffer ahead
> of time (perhaps on the stack, possibly using actual allocations), and
> you use that explicit context.

If my assumption was wrong, isn't it dangerous from stack usage point of
view that we try to call kmalloc() (or allocate from stack memory) for
prbuf_init() for each nested level because it is theoretically possible
that a different IRQ jumps in while kmalloc() is in progress (or stack
memory is in use)?

>
> Yes, it means that you don't do "printk()". You do an actual
> "buf_print()" or similar.
>
> Linus
>

My worry is that there are so many functions which will need to receive
"struct seq_buf *" argument (from tail of __dump_stack() to head of
out_of_memory(), including e.g. cpuset_print_current_mems_allowed()) that
patches for passing "struct seq_buf *" argument becomes so large and
difficult to synchronize. I tried to pass such argument to relevant
functions before I propose "[PATCH] printk: Add best-effort printk()
buffering." patch, but I came to conclusion that passing such argument is
too complicated and too much bloat compared to merit.

If we teach printk subsystem that "I want to use buffering" via
get_printk_buffer(), we don't need to scatter around "struct seq_buf *"
argument throughout the kernel.

Using kmalloc() for prbuf_init() also introduces problems such as

(a) we need to care about safe GFP flags (i.e. GFP_ATOMIC or
GFP_KERNEL or something else which cannot be managed by
current_gfp_context()) based on locking context

(b) allocations can fail, and printing allocation failure messages
when printing original messages is disturbing

(c) allocation stall/failure messages are printed under memory pressure,
but stack memory is not large enough to store messages related
allocation stall/failure messages

and thus I want to use "statically allocated buffer" like workqueue's
rescuer kernel thread which can be used under memory pressure.

Linus Torvalds wrote at http://lkml.kernel.org/r/CA+55aFxmL4ybpz19OPn97VYqAk2ZS-tf=0W2Ff1K=-UUB6mYyg@xxxxxxxxxxxxxx :
> On Fri, Sep 1, 2017 at 10:32 AM, Joe Perches <joe@xxxxxxxxxxx> wrote:
> >
> > Yes, it's a poor name. At least keep using a pr_ prefix.
>
> I'd suggest perhaps just "pr_line()".
>
> And instead of having those "err/info/cont" variations, the severity
> level should just be set at initialization time. Not different
> versions of "pr_line()".
>
> There's no point to having different severity variations, since the
> *only* reason for this would be for buffering. So "pr_cont()" is kind
> of assumed for everything but the first.

But it is annoying for me that

Lines1-for-event1-with-loglevel-foo
Lines2-for-event1-with-loglevel-bar
Lines3-for-event1-with-loglevel-baz

(like OOM killer messages) are all treated as loglevel foo
breaks console_loglevel filtering and

>
> And even if you end up doing multiple lines, if you actually do
> different severities, you damn well shouldn't buffer them together.
> They are clearly different things!

two series of messages

Line1-for-event1-with-loglevel-foo
Line2-for-event1-with-loglevel-bar
Line3-for-event1-with-loglevel-bar
Line4-for-event1-with-loglevel-bar
Line5-for-event1-with-loglevel-baz

by task/1000 and

Line1-for-event2-with-loglevel-foo
Line2-for-event2-with-loglevel-bar
Line3-for-event2-with-loglevel-bar
Line4-for-event2-with-loglevel-bar
Line5-for-event2-with-loglevel-baz

by task/1001 are mixed due to not to buffering lines with different
loglevels causes unreadable logs (unless printk() automatically
inserts context identifier into each line like

foo task/1000 Line1-for-event1-with-loglevel-foo
foo task/1001 Line1-for-event2-with-loglevel-foo
bar task/1001 Line2-for-event2-with-loglevel-bar
bar task/1001 Line3-for-event2-with-loglevel-bar
bar task/1001 Line4-for-event2-with-loglevel-bar
bar task/1000 Line2-for-event1-with-loglevel-bar
bar task/1000 Line3-for-event1-with-loglevel-bar
bar task/1000 Line4-for-event1-with-loglevel-bar
baz task/1000 Line5-for-event1-with-loglevel-baz
baz task/1001 Line5-for-event2-with-loglevel-baz

).