RE: AMDGPU and 16B stack alignment

From: David Laight
Date: Tue Oct 15 2019 - 06:48:42 EST

From: Arnd Bergmann
> Sent: 15 October 2019 08:19
> On Tue, Oct 15, 2019 at 9:08 AM S, Shirish <sshankar@xxxxxxx> wrote:
> > On 10/15/2019 3:52 AM, Nick Desaulniers wrote:
> > My gcc build fails with below errors:
> >
> > dcn_calcs.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
> >
> > dcn_calc_math.c:1:0: error: -mpreferred-stack-boundary=3 is not between 4 and 12
> >
> > While GPF observed on clang builds seem to be fixed.
> Ok, so it seems that gcc insists on having at least 2^4 bytes stack
> alignment when
> SSE is enabled on x86-64, but does not actually rely on that for
> correct operation
> unless it's using sse2. So -msse always has to be paired with
> -mpreferred-stack-boundary=3.
> For clang, it sounds like the opposite is true: when passing 16 byte
> stack alignment
> and having sse/sse2 enabled, it requires the incoming stack to be 16
> byte aligned,
> but passing 8 byte alignment makes it do the right thing.
> So, should we just always pass $(call cc-option, -mpreferred-stack-boundary=4)
> to get the desired outcome on both?

It probably won't solve the problem.
You need to find all the asm blocks that call back into C and ensure they
maintain the required stack alignment.
This might be possible in the kernel, but is almost impossible in userspace.

ISTR that gcc arbitrarily changed the stack alignment for i386 a few years ago.
While it helped code generation it broke a lot of things.
I can't remember the correct set of options to get the stack alignment
code added only where it was needed (generates a double %bp frame).


Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)