Candidate Linux ABI for Intel AMX and hypothetical new related features

From: Andy Lutomirski
Date: Fri Mar 26 2021 - 19:13:43 EST


Hi all-

After some discussion on IRC, I have a proposal for a Linux ABI for
using Intel AMX and other similar features. It works like this:

First, we make XCR0 dynamic. This looks a lot like Keno's patch but
with a different API, outlined below. Different tasks can have
different XCR0 values. The default XCR0 for new tasks does not
include big features like AMX. XMM and YMM are still there. The AVX2
states are debatable -- see below.

To detect features and control XCR0, we add some new arch_prctls:

arch_prctl(ARCH_GET_XCR0_SUPPORT, 0, ...);

returns the set of XCR0 bits supported on the current kernel.

arch_prctl(ARCH_GET_XCR0_LAZY_SUPPORT, 0, ...);

returns 0. See below.

arch_prctl(ARCH_SET_XCR0, xcr0, lazy_states, sigsave_states,
sigclear_states, 0);

Sets xcr0. All states are preallocated except that states in
lazy_states may be unallocated in the kernel until used. (Not
supported at all in v1. lazy_states & ~xcr0 != 0 is illegal.) States
in sigsave_states are saved in the signal frame. States in
sigclear_states are reset to the init state on signal delivery.
States in sigsave_states are restored by sigreturn, and states not in
sigsave_states are left alone by sigreturn.

Optionally we do not support PKRU at all in XCR0 -- it doesn't make
that much sense as an XSAVE feature, and I'm not convinced that trying
to correctly context switch XINUSE[PKRU] is worthwhile. I doubt we
get it right today.

Optionally we come up with a new format for new features in the signal
frame, since the current format is showing its age. Taking 8kB for a
signal with AMX is one thing. Taking another 8kB for a nested signal
if AMX is not in use is worse.

Optionally we make AVX-512 also default off, which fixes what is
arguably a serious ABI break with AVX-512: lots of programs, following
POSIX (!), seem to think that they know much much space to allocate
for sigaltstack(). AVX-512 is too big.

Thoughts?

--Andy