Re: Candidate Linux ABI for Intel AMX and hypothetical new related features
From: Len Brown
Date: Wed Mar 31 2021 - 12:32:23 EST
On Tue, Mar 30, 2021 at 6:01 PM David Laight <David.Laight@xxxxxxxxxx> wrote:
> > Can we leave it in live registers? That would be the speed-of-light
> > signal handler approach. But we'd need to teach the signal handler to
> > not clobber it. Perhaps that could be part of the contract that a
> > fast signal handler signs? INIT=0 AMX state could simply sit
> > patiently in the AMX registers for the duration of the signal handler.
> > You can't get any faster than doing nothing :-)
> >
> > Of course part of the contract for the fast signal handler is that it
> > knows that it can't possibly use XRESTOR of the stuff on the stack to
> > necessarily get back to the state of the signaled thread (assuming we
> > even used XSTATE format on the fast signal handler stack, it would
> > forget the contents of the AMX registers, in this example)
>
> gcc will just use the AVX registers for 'normal' code within
> the signal handler.
> So it has to have its own copy of all the registers.
> (Well, maybe you could make the TMX instructions fault,
> but that would need a nested signal delivered.)
This is true, by default, but it doesn't have to be true.
Today, gcc has an annotation for user-level interrupts
https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Function-Attributes
An analogous annotation could be created for fast signals.
gcc can be told exactly what registers and instructions it can use for
that routine.
Of course, this begs the question about what routines that handler calls,
and that would need to be constrained too.
Today signal-safety(7) advises programmers to limit what legacy signal handlers
can call. There is no reason that a fast-signal-safety(7) could not be created
for the fast path.
> There is also the register save buffer that you need in order
> to long-jump out of a signal handler.
> Unfortunately that is required to work.
> I'm pretty sure the original setjmp/longjmp just saved the stack
> pointer - but that really doesn't work any more.
>
> OTOH most signal handlers don't care - but there isn't a flag
> to sigset() (etc) so ask for a specific register layout.
Right, the idea is to optimize for *most* signal handlers,
since making any changes to *all* signal handlers is intractable.
So the idea is that opting-in to a fast signal handler would opt-out
of some legacy signal capibilities. Complete state is one of them,
and thus long-jump is not supported, because the complete state
may not automatically be available.
thanks,
Len Brown, Intel Open Source Technology Center