I thought we had a closure on the previous thread. But no problem.
It's better late than never.
xsave,xrstor are performance senstive instructions as they are used
in process context switches. It doesn't have to describe itself and
at any time, one can get all the xsave relevant layout information using
cpuid. And when needed, SW can always pass extra information with the
xsave image.
- Magic number (M2)
As I mentioned earlier, we can avoid this magic number, by including
a pointer (which points to start of the fp and xstate on stack) along with M1.
This will catch any one copying the FP state of the frame but not aware of
Xstate.
- Descriptor count (DC)
- DC * <EBX, EAX> from CPUID leaf 0Dh
As you mentioned, this doesn't change after a kernel boot. So do we really
need to save this static information on every signal? (also please see below
about the compaction).
- Possibly a checksum or CRC of this structure
Note that this tail structure will always be the same on a given kernel, so it can be pre-canned at boot time. This tail structure serves two purposes:
- It can be used to verify against truncation of the state.
(I.e. if an XSAVE-unaware application tries to copy and save away
a state and later restore it, but only copies the first 512 bytes
and later just puts a pointer to it.)
As I mentioned above, pointer along with M1 should be enough to catch this?
- It can be used to verify against an alien state (saved and restored
from another CPU, or even just another kernel version with different
support.)
Though the xsave layout is extendable, save area is not
compacted if some features are not supported by processor and/or
system software. This is documented in Vol 2b under "xsave"
instruction.
Given that the descriptor offsets don't change, we can
achieve the same thing with a bit mask representing the state in
the xsave layout. xrstor with the approriate bit masks will automatically
restore/init the state.
- Mismatch on descriptor sizes:
-> Consider that region corrupt and reinitialize?
The region-by-region copy could of course be used even in the same-CPU case, if there turns out to be a negible performance difference over whole-block copy.
Today in 64bit, we directly do fxsave/fxrstor in and out of user-space
for signal handlers. I would like to retain this behavior as much as possible
with xsave/xrstor aswell (and at the same time, provide as much information
as possible for the user to interpret the signal frame). Bit mask representing
the state saved in the xsave image, M1, length and some cookie (pointer along
with M1) to detect the image truncation can achieve this. Isn't it?