Re: [RFC PATCH v2] x86/arch_prctl: Add ARCH_SET_XCR0 to set XCR0 per-thread

From: Keno Fischer
Date: Tue Apr 07 2020 - 09:53:23 EST


> > It's mentioned elsewhere, but I want to emphasize that the return
> > value of xgetbv is the big one because the dynamic linker uses this.
> > rr trace portability is essentially limited to machines with identical
> > xcr0 values because of it.
>
> I'm thinking just exposing that value is doable in a much less
> objectionable fashion, no?

Hi Peter,

I'm not sure I understand what you're asking,
but let me attempt to provide an answer anyway.
If I'm off the mark in what you would like to know,
please let me know and I'll try my best to get back
to you.

rr's operating principle relies upon every instruction
having deterministic and reproducible behavior,
every time they're executed and across machines.
That means literally bitwise identical updates to the
x86 register state. Most instructions do that given
identical register state - of course some don't by
design like rdtsc. Those instructions get trapped
and emulated (we're very lucky that doing so is
possible for all such instructions of practical
interest on Intel hardware). xcr0 puts us in a bit of
a bind here, because it modifies the user-visble
behavior of instructions (in the three ways I mentioned).
The xgetbv behavior is indeed the most problematic.
If there was a way to selectively trap
xgetbv/xsave/xrestor and emulate it, that would likely
prove sufficient (even just xgetbv may be sufficient,
but I'd have to do further work to validate that).
However, I don't think it's possible to trap these
instructions without also disabling the corresponding
xstate components, which we do not want, since
those instructions do actually need to get executed.

Thanks,
Keno