Re: [PATCH] printk: inject caller information into the body of message

From: Sergey Senozhatsky
Date: Wed Sep 12 2018 - 02:53:17 EST


On (09/10/18 13:20), Alexander Potapenko wrote:
> > Awesome. If you and Fengguang can combine forces and lead the
> > whole thing towards "we couldn't care of pr_cont() less", it
> > would be really huuuuuge. Go for it!
>
> Sorry, folks, am I understanding right that pr_cont() and flushing the
> buffer on "\n" are two separate problems that can be handled outside
> Tetsuo's patchset, just assuming pr_cont() is unsupported?
> Or should the pr_cont() cleanup be a prerequisite for that?

Oh... Sorry. I'm quite overloaded at the moment and simply forgot about
this thread.

So what is exactly our problem with pr_cont -- it's not SMP friendly.
And this leads to various things, the most annoying of which is a
preliminary flush.

E.g. let me do a simple thing on my box:

ps aux | grep firefox
kill 2727

dmesg | tail
[ 554.098341] Chrome_~dThread[2823]: segfault at 0 ip 00007f5df153a1f3 sp 00007f5ded47ab00 error 6 in libxul.so[7f5df1531000+4b01000]
[ 554.098348] Code: e7 04 48 8d 15 a6 94 ae 03 48 89 10 c7 04 25 00 00 00 00 00 00 00 00 0f 0b 48 8b 05 57 d0 e7 04 48 8d 0d b0 94 ae 03 48 89 08 <c7> 04 25 00 00 00 00 00 00 00 00 0f 0b e8 4d f4 ff ff 48 8b 05 34
[ 554.109418] Chrome_~dThread[3047]: segfault at 0 ip 00007f3d5bdba1f3 sp 00007f3d57cfab00 error 6
[ 554.109421] Chrome_~dThread[3077]: segfault at 0 ip 00007fe773f661f3 sp 00007fe76fea6b00 error 6
[ 554.109424] in libxul.so[7f3d5bdb1000+4b01000]
[ 554.109426] in libxul.so[7fe773f5d000+4b01000]
[ 554.109429] Code: e7 04 48 8d 15 a6 94 ae 03 48 89 10 c7 04 25 00 00 00 00 00 00 00 00 0f 0b 48 8b 05 57 d0 e7 04 48 8d 0d b0 94 ae 03 48 89 08 <c7> 04 25 00 00 00 00 00 00 00 00 0f 0b e8 4d f4 ff ff 48 8b 05 34


Even such a simple thing as "printk several lines per-crashed process"
is broken. Look at line #0 and lines #2-#5.

And this is the only problem we probably need to address. Overlapping
printk lines -- when several CPUs printk simultaneously, or same CPUs
printk-s from IRQ, etc -- are here by design and it's not going to be
easy to change that (and maybe we shouldn't try).


Buffering multiple lines in printk buffer does not look so simple and
perhaps we should not try to do this, as well. Why:

- it's hard to decide what to do when buffer overflows

Switching to "normal printk" defeats the reason we do buffering in the
first place. Because "normal printk" permits overlapping. So buffering
makes a little sense if we are OK with switching to a "normal printk".

- the more we buffer the more we can lose in case of panic.

We can't flush_on_panic() printk buffers which were allocated on stack.

- flushing multiple lines should be more complex than just a simple
printk loop

while (1) {
x = memchr(buf, '\n', sz);
...
print("%s", buf);
...
}

Because "printk() loop" permits lines overlap. Hence buffering makes
little sense, once again.



So let's reduce the problem scope to "we want to have a replacement for
pr_cont()". And let's address pr_cont()'s "preliminary flush" issue only.


I scanned some of Linus' emails, and skimmed through previous discussions
on this topic. Let me quote Linus:

:
: My preference as a user is actually to just have a dynamically
: re-sizable buffer (that's pretty much what I've done in *every* single
: user space project I've had in the last decade), but because some
: users might have atomicity issues I do suspect that we should just use
: a stack buffer.
:
: And then perhaps say that the buffer size has to be capped at 80 characters.
:
: Because if you're printing more than 80 characters and expecting it
: all to fit on a line, you're doing something else wrong anyway.
:
: And hide it not as a explicit "char buffer[80]]" allocation, but as a
: "struct line_buffer" or similar, so that
:
: (a) people don't get the line size wrong
:
: (b) the buffering code can add a few fields for length etc in there too
:
: Introduce a few helper functions for it:
:
: init_line_buffer(&buf);
: print_line(&buf, fmt, args);
: vprint_line(&buf, fmt, vararg);
: finish_line(&buf);
:



And this is, basically, what I have attached to this email. It's very
simple and very short. And I think this is what Linus wanted us to do.

- usage example

DEFINE_PR_LINE(KERN_ERR, pl);

pr_line(&pl, "Hello, %s!\n", "buffer");
pr_line(&pl, "%s", "OK.\n");
pr_line(&pl, "Goodbye, %s", "buffer");
pr_line(&pl, "\n");

dmesg | tail

[ 69.908542] Hello, buffer!
[ 69.908544] OK.
[ 69.908545] Goodbye, buffer


- pr_cont-like usage

DEFINE_PR_LINE(KERN_ERR, pl);

pr_line(&pl,"%d ", 1);
pr_line(&pl,"%d ", 3);
pr_line(&pl,"%d ", 5);
pr_line(&pl,"%d ", 7);
pr_line(&pl,"%d\n", 9);

dmesg | tail

[ 69.908546] 1 3 5 7 9


- An explicit, aux buffer // output should be truncated

char buf[16];
DEFINE_PR_LINE_BUF(KERN_ERR, ps, buf, sizeof(buf));

pr_line(&ps, "Test test test test test test test test test\n");
pr_line(&ps, "\n");


dmesg | tail

[ 69.908547] Test test test ** truncated **


Opinions? Will this work for us?

====