RE: Candidate Linux ABI for Intel AMX and hypothetical new related features

From: David Laight
Date: Tue Mar 30 2021 - 18:02:37 EST


From: Len Brown
> Sent: 30 March 2021 21:42
>
> On Tue, Mar 30, 2021 at 4:20 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> >
> >
> > > On Mar 30, 2021, at 12:12 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> > >
> > > On 3/30/21 10:56 AM, Len Brown wrote:
> > >> On Tue, Mar 30, 2021 at 1:06 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > >>>> On Mar 30, 2021, at 10:01 AM, Len Brown <lenb@xxxxxxxxxx> wrote:
> > >>>> Is it required (by the "ABI") that a user program has everything
> > >>>> on the stack for user-space XSAVE/XRESTOR to get back
> > >>>> to the state of the program just before receiving the signal?
> > >>> The current Linux signal frame format has XSTATE in uncompacted format,
> > >>> so everything has to be there.
> > >>> Maybe we could have an opt in new signal frame format, but the details would need to be worked
> out.
> > >>>
> > >>> It is certainly the case that a signal should be able to be delivered, run “async-signal-safe”
> code,
> > >>> and return, without corrupting register contents.
> > >> And so an an acknowledgement:
> > >>
> > >> We can't change the legacy signal stack format without breaking
> > >> existing programs. The legacy is uncompressed XSTATE. It is a
> > >> complete set of architectural state -- everything necessary to
> > >> XRESTOR. Further, the sigreturn flow allows the signal handler to
> > >> *change* any of that state, so that it becomes active upon return from
> > >> signal.
> > >
> > > One nit with this: XRSTOR itself can work with the compacted format or
> > > uncompacted format. Unlike the XSAVE/XSAVEC side where compaction is
> > > explicit from the instruction itself, XRSTOR changes its behavior by
> > > reading XCOMP_BV. There's no XRSTORC.
> > >
> > > The issue with using the compacted format is when legacy software in the
> > > signal handler needs to go access the state. *That* is what can't
> > > handle a change in the XSAVE buffer format (either optimized/XSAVEOPT,
> > > or compacted/XSAVEC).
> >
> > The compacted format isn’t compact enough anyway. If we want to keep AMX and AVX512 enabled in XCR0
> then we need to further muck with the format to omit the not-in-use features. I *think* we can pull
> this off in a way that still does the right thing wrt XRSTOR.
>
> Agreed. Compacted format doesn't save any space when INIT=0, so it is
> only a half-step forward.
>
> > If we go this route, I think we want a way for sigreturn to understand a pointer to the state
> instead of inline state to allow programs to change the state. Or maybe just to have a way to ask
> sigreturn to skip the restore entirely.
>
> The legacy approach puts all architectural state on the signal stack
> in XSTATE format.
>
> If we make the signal stack smaller with a new fast-signal scheme, we
> need to find another place for that state to live.
>
> It can't live in the task context switch buffer. If we put it there
> and then take an interrupt while running the signal handler, then we'd
> overwrite the signaled thread's state with the signal handler's state.
>
> Can we leave it in live registers? That would be the speed-of-light
> signal handler approach. But we'd need to teach the signal handler to
> not clobber it. Perhaps that could be part of the contract that a
> fast signal handler signs? INIT=0 AMX state could simply sit
> patiently in the AMX registers for the duration of the signal handler.
> You can't get any faster than doing nothing :-)
>
> Of course part of the contract for the fast signal handler is that it
> knows that it can't possibly use XRESTOR of the stuff on the stack to
> necessarily get back to the state of the signaled thread (assuming we
> even used XSTATE format on the fast signal handler stack, it would
> forget the contents of the AMX registers, in this example)

gcc will just use the AVX registers for 'normal' code within
the signal handler.
So it has to have its own copy of all the registers.
(Well, maybe you could make the TMX instructions fault,
but that would need a nested signal delivered.)

There is also the register save buffer that you need in order
to long-jump out of a signal handler.
Unfortunately that is required to work.
I'm pretty sure the original setjmp/longjmp just saved the stack
pointer - but that really doesn't work any more.

OTOH most signal handlers don't care - but there isn't a flag
to sigset() (etc) so ask for a specific register layout.

I did have 'fun' changing the x86 segment registers so that
the 'return to user' faulted in kernel during the last bit
of the 'return to user' path - and then fixing the fallout.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)