Re: [PATCH 1/2] x86/segment: Introduce storesegment() helper to write segment selectors to memory

From: Uros Bizjak

Date: Tue Mar 31 2026 - 05:57:47 EST


On Tue, Mar 31, 2026 at 8:56 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
>
> * Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
>
> > Introduce a new helper, storesegment(), that stores a segment selector
> > directly into a u16 (or compatible) memory location without using an
> > intermediate general-purpose register.
> >
> > To support this, split the existing SAVE_SEGMENT macro into two parts:
> >
> > SAVE_SEGMENT_VAR(): retains the current behavior of reading a segment
> > register into an unsigned long via a register.
> > SAVE_SEGMENT_PTR(): adds a new variant that writes the 16-bit selector
> > directly to memory.
> >
> > The combined SAVE_SEGMENT() macro now generates both helpers for each
> > segment register.
> >
> > The new storesegment() interface is preferred over savesegment() when
> > the value only needs to be stored (e.g. into a struct field), avoiding
> > an unnecessary register move and making the intent clearer.
> >
> > No functional change for existing users of savesegment().
>
> Why does the API have to be split into =r and =m variants?
>
> Coulnd't we use a more generic constraint and let the compiler
> decide what the target is? Would that negatively impact
> other aspects of code generation?

The "=r" variant actually outputs zero-extended value to the whole
register width. So, the "=r" variant is used to eliminate
zero-extensions when the value is used in the follow-up calculations,
comparisons, or when the value is stored to a location that is more
than 16-bits wide. Additionally, "r" variant always uses MOVL, where
operand size prefix byte (0x66) is not needed.

The "=m" variant only outputs to a 16-bit location. Having "=rm" here
would always emit a 0x66 operand size prefix when register is used as
an output, and there would be many zero-extensions emitted, because
the compiler needs to zero-extend the value from 'unsigned short' to
anything wider.

Other than that, GCC (and Clang, too) has serious problems with "=rm"
output constraints. Forward propagation (AKA combine pass) does not
work reliably with assembly outputs (due to always present clobbers
for assembly clauses), so there will be many cases of moves to a
temporary register and even to a temporary stack location with this
constraint. Having two separate functions (with clear and
informational function comment) leaves the decision to the programmer,
which function is the most optimal.

Uros.