Re: Save a WRMSR GS.base?
From: Andrew Cooper
Date: Mon Jun 08 2026 - 18:59:05 EST
On 08/06/2026 10:21 pm, Borislav Petkov wrote:
> On Mon, Jun 08, 2026 at 01:05:28PM -0700, Borislav Petkov wrote:
>> I think this is begging to be written down somewhere. Lemme point AI to it and
>> see what it would generate.
> Something like the below... I am thinking of sticking that somewhere under
> Documentation... oh look: Documentation/arch/x86/x86_64/fsgs.rst. Looks like
> there was already need to document this stuff which is clearly not really
> transparent. What's there is covering the userspace side more tho.
>
> Anyway, the below is a summary of our thread with AI, it ain't half-bad and
> I think we should write it down for future reference.
This contains some mechanics, but is lacking on the background.
> ---
> GS handling on context switches
History
-------
In 32bit, data segments need reloading on entry to the kernel, and
restoring on exit to userspace. Only the segment selector is necessary,
as all segment data resides in the GDT/LDT. Bases in the GDT/LDT are 32
bits wide. The segment selector values are user-chosen, and effectively
arbitrary.
The 32bit mechanism is slow, so in 64bit, segments were made mostly flat
so as to not need reloading on entry/exit. FS and GS segment bases was
extended to 64 bits, and become accessible via MSRs, and a separate
GS_SHADOW value introduced. The SWAPGS instruction swaps GS_BASE and
GS_SHADOW, as the only action needed on entry/exit.
64bit userspace needed to make the prctl() ARCH_SET_GS to have a base
value greater than 32 bits, and a side effect of this syscall was to
zero the GS selector.
Then the FSGSBASE instructions came along, and userspace could finally
choose an arbitrary base address not previously registered via syscall.
Linux's ABI promises to preserve both the selector value and the full
base, even when they are disconnected.
Mechanisms
----------
> SWAPGS
>
> Swaps the base-address value in MSR_KERNEL_GS_BASE with the active GS.base in
> the hidden portion of the GS selector register.
"Swaps the value in "
>
> MOV <segment selector>, GS
>
> (legacy path, non-FRED) Loads the GS selector and fetches the descriptor
> attributes and base from the GDT/LDT.
Loads the GS segment from the GDT/LDT, including selector, attributes,
limit and base. The 32 bits of base from the GDT/LDT are zero-extended
into the 64bit base register.
> Writes a 32-bit base into the active
> GS.base. Does not touch MSR_KERNEL_GS_BASE. Useless for task switching the
> user GS base on its own without surrounding SWAPGS calls.
I wouldn't quite say useless. There are ways without a SWAPGS, although
they're less neat.
The salient point is that MOV GS clobbers the active kernel per-cpu
pointer, so must be done with custom error handling/recovery so
#GP/#PF/NMI/#MC can get back to the correct per-cpu pointer.
>
> LKGS <selector>
>
> (FRED path, replaces MOV GS) Like MOV GS in that it loads the selector and
> descriptor attributes, but it redirects the base write — instead of updating
> the active GS.base, it writes the descriptor base into IA32_KERNEL_GS_BASE
> (i.e. MSR_KERNEL_GS_BASE).
>
> Critical caveat: it only writes a zero-extended 32-bit value, because GDT/LDT
> descriptors only encode 32-bit bases. This means it cannot correctly represent
> a full 64-bit user-space GS base (e.g. a TLS pointer), so a full 64-bit WRMSR
> is still required afterwards.
"Exactly like MOV GS, except that the 32bit zero-extended base value is
written into GS_SHADOW instead of the active GS.base."
"This instruction ensures that the kernels per-cpu pointer stays good,
and does not need custom error handling."
> WRGSBASE <reg>
>
> (with REX.W) Writes a full 64-bit value directly into the currently active
> GS.base as FS.base and GS.base in 64-bit mode are expanded to 64-bit to cover
> the full address space.
>
> The problem in kernel context: the currently active GS.base belongs to the
> kernel, not the user task. So using this during context switching would
> corrupt the kernel's own GS.base, unless surrounded by SWAPGS (only safe in
> IDT mode). Without REX.W, the upper 32 bits of the base are cleared instead.
The REX.W aspect isn't interesting from a context switching point of
view. It behaves like most other instructions in this regard, even if
it's not interesting to use the 32bit form of WRGSBASE.
>
> WRMSR MSR_KERNEL_GS_BASE / WRMSRNS MSR_KERNEL_GS_BASE
>
> Writes a full 64-bit value into MSR_KERNEL_GS_BASE, which holds the inactive
> (user-space) GS base — the one that gets swapped into the active GS.base on
> SWAPGS. This is the only instruction that can correctly set a 64-bit user GS
> base during a context switch from kernel mode. WRMSRNS is the non-serializing
> variant, which is preferable here since this MSR is architecturally
> non-serializing anyway.
This is where the AI didn't get it right.
AMD ~silently made FS_BASE/GS_BASE/GS_SHADOW become
non-architecturally-serialising, enumerated by
0x80000021.eax[1].FS_GS_NS. I think this was in Zen3.
Intel still has them as architecturally serialising, requiring software
to opt in to non-serialising behaviour using WRMSRNS.
~Andrew