Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID

From: Andy Lutomirski
Date: Mon Aug 03 2020 - 15:24:50 EST





> On Aug 3, 2020, at 10:34 AM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 8/3/20 10:16 AM, Andy Lutomirski wrote:
>> - TILE: genuinely per-thread, but it's expensive so it's
>> lazy-loadable. But the lazy-load mechanism reuses #NM, and it's not
>> fully disambiguated from the other use of #NM. So it sort of works,
>> but it's gross.
>
> For those playing along at home, there's a new whitepaper out from Intel
> about some new CPU features which are going to be fun:
>
>> https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
>
> Which part were you worried about? I thought it was fully disambuguated
> from this:
>
>> When XFD causes an instruction to generate #NM, the processor loads
>> the IA32_XFD_ERR MSR to identify the disabled state component(s).
>> Specifically, the MSR is loaded with the logical AND of the IA32_XFD
>> MSR and the bitmap corresponding to the state components required by
>> the faulting instruction.
>>
>> Device-not-available exceptions that are not due to XFD — those
>> resulting from setting CR0.TS to 1 — do not modify the IA32_XFD_ERR
>> MSR.
>
> So if you always make sure to *clear* IA32_XFD_ERR after handing and XFD
> exception, any #NM's with a clear IA32_XFD_ERR are from "legacy"
> CR0.TS=1. Any bits set in IA32_XFD_ERR mean a new-style XFD exception.
>
> Am I missing something?

I don’t think you’re missing anything, but this mechanism seems to be solidly in the category of “just barely works”, kind of like #DB and DR6, which also just barely works.

And this PASID vs #GP mess is just sad. We’re trying to engineer around an issue that has no need to exist in the first place. For some reason we have two lazy-loading-fault mechanisms showing up in x86 in rapid succession, one of them is maybe 3/4-baked, and the other is totally separate and maybe 1/4 baked.

If ENQCMD instead raise #NM, this would be trivial. (And it even makes sense — the error is, literally, “an instruction tried to use something in XSTATE that isn’t initialized”.). Or if someone had noticed that, kind of like PKRU, PASID didn’t really belong in XSTATE, we wouldn’t have this mess.

Yes, obviously Linux can get all this stuff to work, but maybe Intel could aspire to design features that are straightforward to use well instead of merely possible to use well?