Re: [PATCH v9 08/26] x86/fpu/xstate: Introduce helpers to manage the XSTATE buffer dynamically

From: Dave Hansen
Date: Mon Aug 30 2021 - 13:45:14 EST


On 7/30/21 7:59 AM, Chang S. Bae wrote:
> +/**
> + * get_xstate_size - Calculate an xstate buffer size

Calculate the amount of space needed to store an xstate buffer with the
given features.

> + * @mask: This bitmap tells which components reserved in the buffer.

The set of components for which the space is needed.

> + * Available once those arrays for the offset, size, and alignment info are
> + * set up, by setup_xstate_features().

Please just say:

Consults values populated in setup_xstate_features(). Must be
called after that setup.


> + * Returns: The buffer size
> + */
> +unsigned int get_xstate_size(u64 mask)
> +{
> + unsigned int size;
> + int i, nr;
> +
> + if (!mask)
> + return 0;
> +
> + /*
> + * The minimum buffer size excludes the dynamic user state. When a
> + * task uses the state, the buffer can grow up to the max size.
> + */
> + if (mask == (xfeatures_mask_all & ~xfeatures_mask_user_dynamic))
> + return get_xstate_config(XSTATE_MIN_SIZE);
> + else if (mask == xfeatures_mask_all)
> + return get_xstate_config(XSTATE_MAX_SIZE);

Is this just an optimization? It seems redundant with everything below.
I think that adds to the confusion.

> + nr = fls64(mask) - 1;

"nr" is a really, really, confusing name for this. "last_feature_nr"
might be better. Otherwise, this might be read as "number of features".
Comment might have helped, had there been any.

> + if (!boot_cpu_has(X86_FEATURE_XSAVES))
> + return xstate_offsets[nr] + xstate_sizes[nr];

Doesn't xstate_comp_offsets[] also work for non-compacted features?
setup_xstate_comp_offsets() says so and __raw_xsave_addr() depends on
that behavior.

> + if ((xfeatures_mask_all & (BIT_ULL(nr + 1) - 1)) == mask)
> + return xstate_comp_offsets[nr] + xstate_sizes[nr];

OK, so this is basically saying, "Is the size I'm looking for already
calculated and stored in xstate_comp_offsets[] because the mask is a
subset of xfeatures_mask_all". Right?

I guess that work. But, that's a *LOT* of logic to go uncommented.

> + /*
> + * With the given mask, no relevant size is found so far. So,
> + * calculate it by summing up each state size.
> + */
> + for (size = FXSAVE_SIZE + XSAVE_HDR_SIZE, i = FIRST_EXTENDED_XFEATURE; i <= nr; i++) {
> + if (!(mask & BIT_ULL(i)))
> + continue;
> +
> + if (xstate_aligns[i])
> + size = ALIGN(size, 64);
> + size += xstate_sizes[i];
> + }
> + return size;
> +}

OK, so this finally reveals something important about the function. It
is *trying* to avoid running this loop. All of the above is really just
optimizations to try and avoid doing this loop.

That makes me wonder why you chose that particular set of optimizations.
It also makes me wonder if they're even necessary.

So, first of all, why is this a new loop? Can't it share code with the
XSAVE setup code? That code also calculates the amount of space needed
for an XSAVE buffer given a mask.

Second, which of those optimizations do we *need*? I worry that this is
trying to be way too generic and be *optimized* for being generic code
when it will never really get random masks as input.

For instance, who is going to be calling this with
mask!=xfeatures_mask_all with !boot_cpu_has(X86_FEATURE_XSAVES)? That
seems rather improbable.