Re: [PATCH RESEND] sched/fair: Fix overflow in vruntime_eligible()

From: Heiko Carstens

Date: Mon May 04 2026 - 09:27:24 EST


On Mon, May 04, 2026 at 01:22:39PM +0200, Peter Zijlstra wrote:
> On Fri, May 01, 2026 at 12:40:06PM +0200, Peter Zijlstra wrote:
>
> > Anyway, I had a poke around with godbolt, and the below seems to
> > generate the best code for things like x86_64 and arm64.
> >
> > Specifically, the __builtin_mul_overflow() already has to compute the
> > 128 bit product anyway for most architectures, so using that directly
> > then leads to saner asm and easier to understand code.
> >
> > AFAICT HPPA64 is the only 64bit architecture that doesn't implement
> > __int128 and will thus be demoted to doing what we do on 32bit.
>
> I forgot we had ARCH_SUPPORTS_INT128, and I suppose this had better
> check that. Now, s390 is a bit weird and excludes GCC even though that
> definitely supports __int128. Supposedly there was a issue, but perhaps
> modern GCC has this fixed?

The reason was not a bug (in terms of incorrect code), but gcc generated a
larger than 6kb stack frame for one of the crypto functions - see commit
fbac266f095d ("s390: select ARCH_SUPPORTS_INT128"). That's just too large to
be acceptable.

If I remember correctly gcc generated code which did not reuse known to be
unused stack slots, but created for every variable a new stack slot, for
whatever reason. Which then resulted in such a huge stack frame. With clang
the stack frame size was only 1,5kb.

I just checked: with gcc 15.2.0 we are down to 4.5kb. Still too large :)

Adding s390 compiler folks; but I seem to remember I discussed that back then
with them.