Re: [PATCH v3 09/16] x86/msr: Use the alternatives mechanism for WRMSR

From: H. Peter Anvin

Date: Mon Feb 23 2026 - 16:53:00 EST


On February 23, 2026 9:56:28 AM PST, Xin Li <xin@xxxxxxxxx> wrote:
>
>>>> I _really_ thought this was discussed upfront by Xin before he sent out his
>>>> first version of the series.
>>> I actually reached out to the Intel architects about this before I started
>>> coding. Turns out, if the CPU supports WRMSRNS, you can use it across the
>>> board. The hardware is smart enough to perform a serialized write whenever
>>> a non-serialized one isn't proper, so there’s no risk.
>>
>> Could we be a little more specific here, please?
>
>Sorry as I’m no longer with Intel, I don’t have access to those emails.
>
>Got to mention, also to reply to Sean’s challenge, as usual I didn’t get
>detailed explanation about how would hardware implement WRMSRNS,
>except it falls back to do a serialized write when it’s not *proper*.
>
>
>>
>> If it was universally safe to s/WRMSR/WRMSRNS/, then there wouldn't have
>> been a need for WRMSRNS in the ISA.
>>
>> Even the WRMSRNS description in the SDM talks about some caveats with
>> "performance-monitor events" MSRs. That sounds like it contradicts the
>> idea that the "hardware is smart enough" universally to tolerate using
>> WRMSRNS *EVERYWHERE*.
>>
>> It also says:
>>
>> Like WRMSR, WRMSRNS will ensure that all operations before it do
>> not use the new MSR value and that all operations after the
>> WRMSRNS do use the new value.
>>
>> Which is a handy guarantee for sure. But, it's far short of a fully
>> serializing instruction.
>
>

So to get a little bit of clarity here as to the architectural contract as opposed to the current implementations:

1. WRNSRNS is indeed intended as an opt-in, as opposed to declaring random registers non-serializing a posteori by sheer necessity in technical violation of the ISA.

We should not blindly replace all WRMSRs with WRMSRNS. We should, however, make wrmsrns() fall back to WRMSR on hardware which does not support it, so we can unconditionally replace it at call sites. Many, probably most, would be possible to replace, but for those that make no difference performance-wise there is really no reason to worry about the testing.

It is also quite likely we will find cases where we need *one* serialization after writing to a whole group of MSRs. In that case, we may want to add a sync_cpu_after_wrmsrns() [or something like that] which does a sync_cpu() if and only if WRMSRNS is supported.

I don't know if there will ever be any CPUs which support WRMSRNS but not SERIALIZE, so it might be entirely reasonable to have WRMSRNS depend on SERIALIZE and not bother with the IRET fallback variation.

2. WRMSRNS *may* perform a fully serializing write if the hardware implementation does not support a faster write method for a certain MSR. This is particularly likely for MSRs that have system-wide consequences, but it is also a legitimate option for the hardware implementation for MSRs that are not expected to have any kind of performance impact (full serialization is a very easy way to ensure full consistency and so reduces implementation and verification burden.)

3. All registers, including MSRs, in x86 are subject to scoreboarding, meaning that so-called "psychic effects" (a direct effect being observable before the cause) or use of stale resources are never permitted. This does *not* imply that events cannot be observed out of order, and cross-CPU visibility has its own rules, but that is not relevant for most registers.

4. WRMSRNS immediate can be reasonably expected to be significantly faster than even WRMSRNS ecx (at least for MSRs deemed valuable to optimize), because the MSR number is available to the hardware at the very beginning of the instruction pipeline. To take proper advantage of that, it is desirable to avoid calling wrmsrns() with a non-constant value in code paths where performance matters, even if it bloats the code somewhat. The main case which I can think about that might actually matter is context-switching with perf enabled (also a good example for wanting to SERIALIZE or at least MFENCE or LFENCE after the batch write if they will have effects before returning to user space.) There is also of course the option of dynamically generating a code snippet if the list of MSRs is too dynamic.