Re: Candidate Linux ABI for Intel AMX and hypothetical new related features
From: Andy Lutomirski
Date: Sat May 22 2021 - 19:56:10 EST
> On May 22, 2021, at 12:17 AM, Florian Weimer <fweimer@xxxxxxxxxx> wrote:
>
> * Len Brown:
>
>> A. per-task. If we do it this way, then we will likely wind up
>> mandating a GET at the start of every routine in every library that
>> touches AMX, and potentially also a PUT. This is because the library
>> has no idea what thread called it. The plus is that this will address
>> the "used once and sits on a buffer for the rest of the process
>> lifetime' scenario. The minus is that high performance users will be
>> executing thousands of unnecessary system calls that have zero value.
>
> We could revive the KTLS proposal (userspace donates memory for use by
> the kernel & vDSO), and the thread could reserve (on-stack) buffer space
> for kernel use for the duration of the AMX computation. There would be
> a pointer to that space in the KTLS area, set upon entry of the AMX
> region, and cleared upon exit. It's not extremely cheap (unbounded
> alloca has a stack probing loop nowadays). But no system call is
> required.
>
Making this work well would be very nasty. The memory *must* be
available at context switch out time, which means it would need to be
pinned at context switch in time, which is not great.
But also Intel, in its infinite wisdom, decided to mix “supervisor”
states in which the state that user space is permitted to directly
access. Putting the supervisor state on the stack would be
problematic.