Re: [PATCH v9 08/26] x86/fpu/xstate: Introduce helpers to manage the XSTATE buffer dynamically

From: Bae, Chang Seok
Date: Mon Aug 30 2021 - 19:39:40 EST


On Aug 30, 2021, at 10:45, Hansen, Dave <dave.hansen@xxxxxxxxx> wrote:
<snip>
> On 7/30/21 7:59 AM, Chang S. Bae wrote:
>>
>> + /*
>> + * The minimum buffer size excludes the dynamic user state. When a
>> + * task uses the state, the buffer can grow up to the max size.
>> + */
>> + if (mask == (xfeatures_mask_all & ~xfeatures_mask_user_dynamic))
>> + return get_xstate_config(XSTATE_MIN_SIZE);
>> + else if (mask == xfeatures_mask_all)
>> + return get_xstate_config(XSTATE_MAX_SIZE);
>
> Is this just an optimization? It seems redundant with everything below.
> I think that adds to the confusion.

Boris suggested to remove the below instead [1]:

"So leave only the first two which are obvious and are more likely to
happen - the first one is going to be the most likely on non-dynamic
setups and the second one is on dynamic systems."

>> + nr = fls64(mask) - 1;
>
> "nr" is a really, really, confusing name for this. "last_feature_nr"
> might be better. Otherwise, this might be read as "number of features".
> Comment might have helped, had there been any.

Yes, it seems to be the case.

>> + if (!boot_cpu_has(X86_FEATURE_XSAVES))
>> + return xstate_offsets[nr] + xstate_sizes[nr];
>
> Doesn't xstate_comp_offsets[] also work for non-compacted features?
> setup_xstate_comp_offsets() says so and __raw_xsave_addr() depends on
> that behavior.

Yes, but I think using xstate_comp_offsets[] for non-compacted format instead
of xstate_offsets[] here just makes confusion.

>> + if ((xfeatures_mask_all & (BIT_ULL(nr + 1) - 1)) == mask)
>> + return xstate_comp_offsets[nr] + xstate_sizes[nr];
>
> OK, so this is basically saying, "Is the size I'm looking for already
> calculated and stored in xstate_comp_offsets[] because the mask is a
> subset of xfeatures_mask_all". Right?
>
> I guess that work. But, that's a *LOT* of logic to go uncommented.

Boris suggested simplifying the function by removing this [2]:
> But it might be better to simplify this hunk for readability. I
> suspect its call sites are not that performance-critical.
That's *exactly* what I'm driving at!

And I applied on v10 [3].

>> + /*
>> + * With the given mask, no relevant size is found so far. So,
>> + * calculate it by summing up each state size.
>> + */
>> + for (size = FXSAVE_SIZE + XSAVE_HDR_SIZE, i = FIRST_EXTENDED_XFEATURE; i <= nr; i++) {
>> + if (!(mask & BIT_ULL(i)))
>> + continue;
>> +
>> + if (xstate_aligns[i])
>> + size = ALIGN(size, 64);
>> + size += xstate_sizes[i];
>> + }
>> + return size;
>> +}
>
> OK, so this finally reveals something important about the function. It
> is *trying* to avoid running this loop. All of the above is really just
> optimizations to try and avoid doing this loop.
>
> That makes me wonder why you chose that particular set of optimizations.
> It also makes me wonder if they're even necessary.
>
> So, first of all, why is this a new loop? Can't it share code with the
> XSAVE setup code? That code also calculates the amount of space needed
> for an XSAVE buffer given a mask.

This runtime function uses the recorded values for offset, size, and alignment
instead of performing CPUID. The loop in the setup function references CPUID
values.

> Second, which of those optimizations do we *need*? I worry that this is
> trying to be way too generic and be *optimized* for being generic code
> when it will never really get random masks as input.
>
> For instance, who is going to be calling this with
> mask!=xfeatures_mask_all with !boot_cpu_has(X86_FEATURE_XSAVES)? That
> seems rather improbable.

This function is considered to help the dynamic state allocation function and
some others. Avoiding the loop might be helpful for the future, especially when
some other dynamic states are enabled.

V10 has a much-trimmed version [3] now as that optimization is not needed with
AMX enabling.

Thanks,
Chang

[1]: https://lore.kernel.org/lkml/YRzSuC25eHEOgj6h@xxxxxxx/
[2]: https://lore.kernel.org/lkml/YRZDu2Rk+KdRhh1U@xxxxxxx/
[3]: https://lore.kernel.org/lkml/20210825155413.19673-10-chang.seok.bae@xxxxxxxxx/