Re: x86: xsave/xrstor support; ucontext_t extensions

From: H. Peter Anvin
Date: Fri Jun 06 2008 - 19:17:54 EST


Suresh Siddha wrote:
I thought we had a closure on the previous thread. But no problem.
It's better late than never.

I apologize. The last two months have been exceptionally tough.

xsave,xrstor are performance senstive instructions as they are used
in process context switches. It doesn't have to describe itself and
at any time, one can get all the xsave relevant layout information using
cpuid. And when needed, SW can always pass extra information with the
xsave image.

Yes, the big problem with it is its monolithic nature (with no realistic alternate instruction.)

- Magic number (M2)

As I mentioned earlier, we can avoid this magic number, by including
a pointer (which points to start of the fp and xstate on stack) along with M1.

As I mentioned before, this introduces a very different constraint, which is a really bad precedent; data shouldn't be dependent on its own location.

This will catch any one copying the FP state of the frame but not aware of
Xstate.

- Descriptor count (DC)
- DC * <EBX, EAX> from CPUID leaf 0Dh

As you mentioned, this doesn't change after a kernel boot. So do we really
need to save this static information on every signal? (also please see below
about the compaction).

I think given the compaction constraint we're okay with the bitmap plus length of the area.

- Possibly a checksum or CRC of this structure

Note that this tail structure will always be the same on a given kernel, so it can be pre-canned at boot time. This tail structure serves two purposes:

- It can be used to verify against truncation of the state.
(I.e. if an XSAVE-unaware application tries to copy and save away
a state and later restore it, but only copies the first 512 bytes
and later just puts a pointer to it.)

As I mentioned above, pointer along with M1 should be enough to catch this?

I think the pointer is a really really bad idea. Even if we don't need the structure I think having a tail magic is the better alternative, and it's also really cheap to do since you have to have the length pointer anyway.

- It can be used to verify against an alien state (saved and restored
from another CPU, or even just another kernel version with different
support.)

Though the xsave layout is extendable, save area is not
compacted if some features are not supported by processor and/or
system software. This is documented in Vol 2b under "xsave"
instruction.

Ah, you're right, my bad. That does make the problem substantially simpler (I somehow read only the second half of the and/or clause, but it's all there.) So, OK, no need for descriptors (he says, as he waits for the architectural shoe to drop, especially in a multivendor environment.)

Given that the descriptor offsets don't change, we can
achieve the same thing with a bit mask representing the state in
the xsave layout. xrstor with the approriate bit masks will automatically
restore/init the state.

Agreed.

- Mismatch on descriptor sizes:
-> Consider that region corrupt and reinitialize?

The region-by-region copy could of course be used even in the same-CPU case, if there turns out to be a negible performance difference over whole-block copy.

Today in 64bit, we directly do fxsave/fxrstor in and out of user-space
for signal handlers. I would like to retain this behavior as much as possible
with xsave/xrstor aswell (and at the same time, provide as much information
as possible for the user to interpret the signal frame). Bit mask representing
the state saved in the xsave image, M1, length and some cookie (pointer along
with M1) to detect the image truncation can achieve this. Isn't it?

If the state is complete, which it of course will be something like 99.9999% of the time, then doing XRSTOR from user space should work just fine. The case of having to stitch up state is clearly an exceptional case, which is not at all performance critical in any way.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/