Re: [PATCH 1/3] ring-buffer: Add uname to match criteria for persistent ring buffer

From: Linus Torvalds
Date: Tue Dec 17 2024 - 18:33:07 EST


On Tue, 17 Dec 2024 at 14:52, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> Where the buf value holds the binary storage from vbin_printf() and written
> in trace_vbprintk(). Yeah, it looks like it does depend on the arguments
> being word aligned.

So the thing is, they actually aren't word-aligned at all.

Yes, the buffer is ostensibly individual words, and the buffer size is
given in words, and 64-bit data is saved/fetched as two separate
words.

But if you look more closely, you'll see that the way the buffer is
managed is actually not as a word array at all, but using

char *str, *end;

instead of word pointers. And the reason is because it does

str = PTR_ALIGN(str, sizeof(type));
...
str += sizeof(type);

so byte-sized data is *not* given a word, it's only given a single
char, and then if it's followed by an "int", the pointer will be
aligned at that point.

It does mean that the binary buffers are a bit denser, but since %c
and %hhd etc are VERY unusual (and often will be mixed with int-sized
data), that denser format isn't commonly an actual advantage.

And the reason I noticed this? When I was trying to clean up and
simplify the vsnprintf() code to not have so many different cases, the
*regular* number handling for char/short/int-sized cases ends up being
just one case that looks like this:

num = get_num(va_arg(args, int), fmt.state, spec);

because char/short/int are all just "va_arg(args, int)" values.

Then the actual printout depends on that printf_spec thing (and, in my
experimental branch, that "fmt.state", but that's a local trial thing
where I split the printf_spec differently for better code generation).

So basically the core vfsprintf case doesn't need to care about
fetching char/short/int, because va_args always just formats those
kinds of arguments as int, as per the usual C implicit type expansion
rules.

But the binary printf thing then has to have three different cases,
because unlike the normal calling convention, it actually packs
char/short/int differently. So with all those nice cleanups I tried
still look like this:

case FORMAT_STATE_2BYTE:
num = get_num(get_arg(short), fmt.state, spec);
break;
case FORMAT_STATE_1BYTE:
num = get_num(get_arg(char), fmt.state, spec);
break;
default:
num = get_num(get_arg(int), fmt.state, spec);
break;

which is admittedly still a lot better than the current situation in
top-of-tree, which has separate versions for a *lot* more types.

So right now top-of-tree has 11 different enumerated cases for number printing:

FORMAT_TYPE_LONG_LONG,
FORMAT_TYPE_ULONG,
FORMAT_TYPE_LONG,
FORMAT_TYPE_UBYTE,
FORMAT_TYPE_BYTE,
FORMAT_TYPE_USHORT,
FORMAT_TYPE_SHORT,
FORMAT_TYPE_UINT,
FORMAT_TYPE_INT,
FORMAT_TYPE_SIZE_T,
FORMAT_TYPE_PTRDIFF

and in my test cleanup, I've cut this down to just four cases:
FORMAT_STATE_xBYTE (x = 1, 2, 4, 8).

And the actual argument fetch for the *normal* case is actually just
two cases (because the 8-byte and the "4 bytes or less" cases are
different for va_list, but 1/2/4 bytes are just that single case).

But the binary printf argument save/fetch is not able to be cut down
to just two cases because of how it does that odd "ostensibly a word
array, but packed byte/short fields" thing.

Oh well. It's not a big deal. But I was doing this to see if I could
regularize the vsnprintf() logic and make sharing better - and then
just the binary version already causes unnecessary duplication.

If the *only* thing that accesses that word array is vbin_printf and
bstr_printf, then I could just change the packing to act like va_list
does (ie the word array would *actually* be a word array, and char and
short values would get individual words).

But at least the bpf cde seems to know about the crazy packing, and
also does that

tmp_buf = PTR_ALIGN(tmp_buf, sizeof(u32));

in bpf_bprintf_prepare() because it knows that it's not *actually* an
array of words despite it being documented as such.

Of course, the bpf code only does the packed access thing for '%c',
and doesn't seem to know that the same is true of '%hd' and '%hhd',
presumably because nobody actually uses that.

Let's add Alexei to the participants. I think bpf too would actually
prefer that the odd char/short packing *not* be done, if only because
it clearly does the wrong thing as-is for non-%c argument (ie those
%hd/%hhd cases).

Who else accesses that odd "binary printed pseudo-word array"?

Linus