Re: Candidate Linux ABI for Intel AMX and hypothetical new related features
From: Andy Lutomirski
Date: Wed Mar 31 2021 - 12:54:25 EST
> On Mar 31, 2021, at 9:31 AM, Len Brown <lenb@xxxxxxxxxx> wrote:
>
> On Tue, Mar 30, 2021 at 6:01 PM David Laight <David.Laight@xxxxxxxxxx> wrote:
>
>>> Can we leave it in live registers? That would be the speed-of-light
>>> signal handler approach. But we'd need to teach the signal handler to
>>> not clobber it. Perhaps that could be part of the contract that a
>>> fast signal handler signs? INIT=0 AMX state could simply sit
>>> patiently in the AMX registers for the duration of the signal handler.
>>> You can't get any faster than doing nothing :-)
>>>
>>> Of course part of the contract for the fast signal handler is that it
>>> knows that it can't possibly use XRESTOR of the stuff on the stack to
>>> necessarily get back to the state of the signaled thread (assuming we
>>> even used XSTATE format on the fast signal handler stack, it would
>>> forget the contents of the AMX registers, in this example)
>>
>> gcc will just use the AVX registers for 'normal' code within
>> the signal handler.
>> So it has to have its own copy of all the registers.
>> (Well, maybe you could make the TMX instructions fault,
>> but that would need a nested signal delivered.)
>
> This is true, by default, but it doesn't have to be true.
>
> Today, gcc has an annotation for user-level interrupts
> https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Function-Attributes
>
> An analogous annotation could be created for fast signals.
> gcc can be told exactly what registers and instructions it can use for
> that routine.
>
> Of course, this begs the question about what routines that handler calls,
> and that would need to be constrained too.
>
> Today signal-safety(7) advises programmers to limit what legacy signal handlers
> can call. There is no reason that a fast-signal-safety(7) could not be created
> for the fast path.
>
>> There is also the register save buffer that you need in order
>> to long-jump out of a signal handler.
>> Unfortunately that is required to work.
>> I'm pretty sure the original setjmp/longjmp just saved the stack
>> pointer - but that really doesn't work any more.
>>
>> OTOH most signal handlers don't care - but there isn't a flag
>> to sigset() (etc) so ask for a specific register layout.
>
> Right, the idea is to optimize for *most* signal handlers,
> since making any changes to *all* signal handlers is intractable.
>
> So the idea is that opting-in to a fast signal handler would opt-out
> of some legacy signal capibilities. Complete state is one of them,
> and thus long-jump is not supported, because the complete state
> may not automatically be available.
Long jump is probably the easiest problem of all: sigsetjmp() is a *function*, following ABI, so sigsetjmp() is expected to clobber most or all of the extended state.
But this whole annotation thing will require serious compiler support. We already have problems with compilers inlining functions and getting confused about attributes.
An API like:
if (get_amx()) {
use AMX;
} else {
don’t;
}
Avoids this problem. And making XCR0 dynamic, for all its faults, at least helps force a degree of discipline on user code.
>
> thanks,
> Len Brown, Intel Open Source Technology Center