RE: [PATCH v2] overflow: optimize struct_size() calculation

From: Vincent Mailhol
Date: Fri Sep 13 2024 - 00:48:36 EST


[resend] I do not know why but below message got blocked for, I quote:

Your message looked spammy to us. Please check
https://subspace.kernel.org/etiquette.html and resend.

And I have no clue which part I violated. Maybe it is the gmail web
client? Resending with the git client, hoping this time it will work.

On Tue. 12 sept. 2024 at 01:46, David Laight <David.Laight@xxxxxxxxxx> wrote:
> ...
> > > [1] Both the '+' and '*' have extra code to detect overflow and return
> > > a 'big' value that will cause kmalloc() to return NULL.
> > > I've not looked at the generated code but it is likely to be horrid
> > > (especially the check for multiply overflowing).
> > > In this case there are enough constants that the alternative check:
> > > if (count > (MAX_SIZE - sizeof (*s)) / sizeof (s->member))
> > > size = MAX_SIZE;
> > > else
> > > size = sizeof (*s) + count * sizeof (s->member);
> > > is fine and only has one conditional in it.
> > > In some cases the compiler may already know that the count is small enough.
> >
> > Indeed. If count is small enough, the code isn't that horrid. If I
> > take this example:
> >
> > size_t foo(u32 count)
> > {
> > return struct_size_t(struct s, fam, count);
> > }
> >
> > I got this code:
>
> What happens if the flex-array is larger than 1 byte - so a multiply is needed.
> Probably worth testing something that isn't a power of 2.
> Also check 32bit archs -godbolt can help.

Here it is:

https://godbolt.org/z/dKdvW631e

The original is on the right, my patch v2 is on the left. I did not
add the optimization of only doing the max() if sizeof(struct) !=
offsetof().

It always takes time to copy the macros and make them work in godbolt,
so I was just a bit lazy in my previous message.

If the fam not being a power of 2 does not change things so much. At
the end, it is just a problem whether:

sizeofof(*s) + sizeof(*s->fam) * type_max(count)

wraps around or not.

Of course, there are a few cases in which the multiplication will not
warp, but only the addition will. This always occurs when the fam
element is one byte and the element count has the same type width as
size_t, but it could also happen on degenerated cases as illustrated
in warp_only_on_add_fam32() (I can not think of a more realistic
example).

I will do a few more tests and maybe come up with a v3 depending on
what I found.

> David
>
> >
> > 0000000000000010 <foo>:
> > 10: f3 0f 1e fa endbr64
> > 14: 89 f8 mov %edi,%eax
> > 16: 48 83 c0 10 add $0x10,%rax
> > 1a: e9 00 00 00 00 jmp 1f <foo+0xf>
>
> Add -d to objdump to get the relocation printed.

What I previously posted was the result of:

objdump -d foo.o

> (I think this is a build where 'ret' isn't allowed :-)

I just did a defconfig build. I rarely try to tweak this part of the
config. Now that I did the gobolt, I will not investigate more on the
config.


Yours sincerely,
Vincent Mailhol