Re: [PATCH RFC 4/5] x86/alternative: Improve code generation readability
From: Josh Poimboeuf
Date: Wed Apr 09 2025 - 13:41:49 EST
On Wed, Apr 09, 2025 at 04:38:21PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 08, 2025 at 01:21:17AM -0700, Josh Poimboeuf wrote:
> > Improve the readability and compactness of alternatives code.
> >
> > ---------------------
> > ALTERNATIVE() before:
> > ---------------------
> >
> > # ALT: oldinstr
> > 771:
> > rep movsb
> > 772:
> > # ALT: padding
> > .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
> > 773:
> > .pushsection .altinstructions,"a"
> > .long 771b - .
> > .long 774f - .
> > .4byte (((1 << 0) << 16) | ((18*32+ 4)))
> > .byte 773b-771b
> > .byte 775f-774f
> > .popsection
> > .pushsection .altinstr_replacement, "ax"
> > # ALT: replacement
> > 774:
> > call rep_movs_alternative
> > 775:
> > .popsection
> >
> > --------------------
> > ALTERNATIVE() after:
> > --------------------
> >
> > # <ALTERNATIVE>
> > 771: rep movsb
> > 772: .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
> > 773:
> > # ALT ENTRY:
> > .pushsection .altinstructions,"a"; .long 771b - .; .long 774f - .; .4byte (((1 << 0) << 16) | ((18*32+ 4))); .byte 773b-771b; .byte 775f-774f; .popsection
>
> I find this very hard to read. I prefer the multi line form it had
> before.
I actually think the compactness of putting the entry on a single line
is really nice.
We're usually more interested in reading the code *around* the
alternatives, rather than the alternatives themselves.
Making the alt entry a big sprawling mess, for what typically resolves
to just a few instructions, makes that a LOT harder. And it's *really*
bad for ALTERNATIVE_2() and ALTERNATIVE_3().
Also, 99% of the time, we're not going to be debugging the alternatives
themselves. Spreading out the alt entry directives across multiple
lines is way overkill. It wastes space and cognitive load.
That's what the comments are for. All you really need to care about are
the original instructions, the replacement instructions, and potentially
what feature+flags.
We could even print the feature by adding the '(((1 << 0) << 16) |
((18*32+ 4)))' to one of the comments.
Or even better, we can just show the actual human readable feature:
ALT_NOT(X86_FEATURE_FSRM).
Something like:
# <ALTERNATIVE>
771: rep movsb
772: .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
773:
# ALT ENTRY:
.pushsection .altinstructions,"a"; .long 771b - .; .long 774f - .; .4byte (((1 << 0) << 16) | ((18*32+ 4))); .byte 773b-771b; .byte 775f-774f; .popsection
# ALT REPLACEMENT: ALT_NOT(X86_FEATURE_FSRM):
.pushsection .altinstr_replacement, "ax"
774: call rep_movs_alternative
775:
.popsection
# </ALTERNATIVE>
Then with all the relevant information in the comments, there's no
reason to make the alt entry directives themselves very readable, right?
Unless alternatives are broken and you're debugging it. Which should
very much be the exception rather than the rule.
One thing that does bother me slightly is the asymmetry of having the
.pushsection and .popsection on the same line, whereas the replacement
has them on separate lines. We could just put the .popsection on its
own line:
# <ALTERNATIVE>
771: rep movsb
772: .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
773:
# ALT ENTRY:
.pushsection .altinstructions,"a"; .long 771b - .; .long 774f - .; .4byte (((1 << 0) << 16) | ((18*32+ 4))); .byte 773b-771b; .byte 775f-774f
.popsection
# ALT REPLACEMENT: ALT_NOT(X86_FEATURE_FSRM):
.pushsection .altinstr_replacement, "ax"
774: call rep_movs_alternative
775:
.popsection
# </ALTERNATIVE>
--
Josh