Re: [RFC PATCH 13/22] x86/fpu/xstate: Expand dynamic user state area on first use
From: Andy Lutomirski
Date: Wed Oct 14 2020 - 05:20:57 EST
> On Oct 13, 2020, at 3:44 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 10/13/20 3:31 PM, Brown, Len wrote:
>> vmalloc() does not fail, and does not return an error, and so there is no concept
>> of returning a signal.
>
> Well, the order-0 allocations are no-fail, as are the vmalloc kernel
> structures and the page tables that might have to be allocated. But,
> that's not guaranteed to be in place *forever*. I think we still need
> to check for and handle allocation failures, even if they're not known
> to be possible today.
>
>> If we got to the point where vmalloc() sleeps, then the system
>> has bigger OOM issues, and the OOM killer would be on the prowl.
>
> vmalloc() can *certainly* sleep. Allocation failures mean returning
> NULL from the allocator, and the very way we avoid doing that is by
> sleeping to go reclaim some memory from some other allocation.
>
> Sleeping is a normal and healthy part of handling allocation requests,
> including vmalloc().
>
>> If we were concerned about using vmalloc for a couple of pages in the task structure,
>> Then we could implement a routine to harvest unused buffers and free them --
>> but that didn't seem worth the complexity. Note that this feature is 64-bit only.
>
> IMNHO, vmalloc() is overkill for ~10k, which is roughly the size of the
> XSAVE buffer for the first AMX implementation. But, it's not overkill
> for the ~66k of space that will be needed if some CPU implementation
> comes along and uses all of the architectural space AMX provides.
I have no problem with vmalloc(), but I do have a problem with vfree() due to the IPIs that result. We need a cache or something.
I have to say: this mechanism is awful. Can we get away with skipping the dynamic XSAVES mess entirely? What if we instead allocate however much space we need as an array of pages and have one percpu contiguous region. To save, we XSAVE(S or C) just the AMX state to the percpu area and then copy it. To restore, we do the inverse. Or would this kill the modified optimization and thus be horrible?