Re: [PATCH RESEND] sched/fair: Fix overflow in vruntime_eligible()
From: Stefan Schulze Frielinghaus
Date: Mon May 04 2026 - 13:40:55 EST
On Mon, May 04, 2026 at 03:16:09PM +0200, Heiko Carstens wrote:
> On Mon, May 04, 2026 at 01:22:39PM +0200, Peter Zijlstra wrote:
> > On Fri, May 01, 2026 at 12:40:06PM +0200, Peter Zijlstra wrote:
> >
> > > Anyway, I had a poke around with godbolt, and the below seems to
> > > generate the best code for things like x86_64 and arm64.
> > >
> > > Specifically, the __builtin_mul_overflow() already has to compute the
> > > 128 bit product anyway for most architectures, so using that directly
> > > then leads to saner asm and easier to understand code.
> > >
> > > AFAICT HPPA64 is the only 64bit architecture that doesn't implement
> > > __int128 and will thus be demoted to doing what we do on 32bit.
> >
> > I forgot we had ARCH_SUPPORTS_INT128, and I suppose this had better
> > check that. Now, s390 is a bit weird and excludes GCC even though that
> > definitely supports __int128. Supposedly there was a issue, but perhaps
> > modern GCC has this fixed?
>
> The reason was not a bug (in terms of incorrect code), but gcc generated a
> larger than 6kb stack frame for one of the crypto functions - see commit
> fbac266f095d ("s390: select ARCH_SUPPORTS_INT128"). That's just too large to
> be acceptable.
>
> If I remember correctly gcc generated code which did not reuse known to be
> unused stack slots, but created for every variable a new stack slot, for
> whatever reason. Which then resulted in such a huge stack frame. With clang
> the stack frame size was only 1,5kb.
>
> I just checked: with gcc 15.2.0 we are down to 4.5kb. Still too large :)
>
> Adding s390 compiler folks; but I seem to remember I discussed that back then
> with them.
I briefly remember our discussion. The short story is that I'm not
aware of any change in that area. The long story is that gcc is
conservative when it comes to re-using stack slots for different
objects. Thus, if gcc cannot prove that a stack slot is at some point
in time not referred to anymore, the stack slot is not reused for some
other object. In the particular case, due to inlining and IIRC loop
unrolling, a huge amount of stack slots had to be allocated and couldn't
be coalesced.
Cheers,
Stefan