Re: div_u64/do_div stack size usage, was Re: [v3] block: Removed a warning while compiling with a cross compiler for parisc
From: Arnd Bergmann
Date: Tue Jul 06 2021 - 15:24:41 EST
On Tue, Jul 6, 2021 at 7:03 PM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>
> On Tue, Jul 06, 2021 at 05:30:54PM +0200, Abd-Alrhman Masalkhi wrote:
> > Thank you for your comment, the div_u64 function is called 5 times
> > inside diskstats_show function, so I have made a test case; I have
> > replaced one call with a constant number then I have compiled the
> > kernel, the result was instead of emitting "the frame size of 1656
> > bytes is larger than 1280 bytes" warning, it has emitted "the frame
> > size of 1328 bytes is larger than 1280 bytes" warning, so I came to the
> > conclusion that each call to div_u64 will add 328 bytes to the stack
> > frame of diskstats_show function, since it is an inlined function. so I
> > thought it might be the solution that to preventing div_u64 to be
> > inlined in diskstats_show function.
>
> Adding a bunch of relevant parties to the CC list - any idea how we
> can make the generic do_div / div_u64 not use up such gigantic amounts
> of stack?
I've seen variations of this problem many times, though usually not
involving do_div().
My guess is that this is happening here because of a combination of
things, most of the time it doesn't get nearly as bad:
- parisc has larger stack frames than others
- ilog2() as used in __div64_const32() is somewhat unreliable, it may
end up determining that its input is a __builtin_constant_p(), but then
still produce code for the non-constant case when the caller is
only partially inlined
- Some compiler options make the problem worse by increasing the
pressure on the register allocator.
- Some compiler targets don't deal well with register pressure and
use more stack slots than they really should.
If you have the .config file that triggers this and the exact compiler
version, I can have a closer look to narrow down which of these
are part of the problem for that particular file.
One thing we did on ARM OABI (which does not deal well with
64-bit math) was to turn off the inline version of __arch_xprod_64
and instead use an extern function for do_div().
Arnd