Re: [PATCH] x86/asm: pessimize the pre-initialization case in static_cpu_has()

From: H. Peter Anvin
Date: Thu Sep 09 2021 - 17:29:15 EST


On 9/9/21 10:01 AM, Borislav Petkov wrote:
On Wed, Sep 08, 2021 at 10:17:16AM -0700, H. Peter Anvin (Intel) wrote:

Subject: Re: [PATCH] x86/asm: pessimize the pre-initialization case in static_cpu_has()

"pessimize" huh? :)

Why not simply

"Do not waste registers in the pre-initialization case..."


Because it is shorter and thus can fit more contents

?

gcc will sometimes manifest the address of boot_cpu_data in a register
as part of constant propagation. When multiple static_cpu_has() are
used this may foul the mainline code with a register load which will
only be used on the fallback path, which is unused after
initialization.

So a before-after thing looks like this here:

before:

ffffffff89696517 <.altinstr_aux>:
ffffffff89696517: f6 05 cb 09 cb ff 80 testb $0x80,-0x34f635(%rip) # ffffffff89346ee9 <boot_cpu_data+0x69>
ffffffff8969651e: 0f 85 fc 3e fb ff jne ffffffff8964a420 <intel_pmu_init+0x14e7>
ffffffff89696524: e9 ee 3e fb ff jmp ffffffff8964a417 <intel_pmu_init+0x14de>
ffffffff89696529: f6 45 6a 08 testb $0x8,0x6a(%rbp)
ffffffff8969652d: 0f 85 45 b9 97 f7 jne ffffffff81011e78 <intel_pmu_lbr_filter+0x68>
ffffffff89696533: e9 95 b9 97 f7 jmp ffffffff81011ecd <intel_pmu_lbr_filter+0xbd>
ffffffff89696538: 41 f6 44 24 6a 08 testb $0x8,0x6a(%r12)
ffffffff8969653e: 0f 85 d3 bc 97 f7 jne ffffffff81012217 <intel_pmu_store_lbr+0x77>
ffffffff89696544: e9 d9 bc 97 f7 jmp ffffffff81012222 <intel_pmu_store_lbr+0x82>
ffffffff89696549: 41 f6 44 24 6a 08 testb $0x8,0x6a(%r12)

after:

ffffffff89696517 <.altinstr_aux>:
ffffffff89696517: f6 04 25 e9 6e 34 89 testb $0x80,0xffffffff89346ee9
ffffffff8969651e: 80
ffffffff8969651f: 0f 85 fb 3e fb ff jne ffffffff8964a420 <intel_pmu_init+0x14e7>
ffffffff89696525: e9 ed 3e fb ff jmp ffffffff8964a417 <intel_pmu_init+0x14de>
ffffffff8969652a: f6 04 25 ea 6e 34 89 testb $0x8,0xffffffff89346eea
ffffffff89696531: 08
ffffffff89696532: 0f 85 37 b9 97 f7 jne ffffffff81011e6f <intel_pmu_lbr_filter+0x5f>
ffffffff89696538: e9 89 b9 97 f7 jmp ffffffff81011ec6 <intel_pmu_lbr_filter+0xb6>
ffffffff8969653d: f6 04 25 ea 6e 34 89 testb $0x8,0xffffffff89346eea
ffffffff89696544: 08
ffffffff89696545: 0f 85 b5 bc 97 f7 jne ffffffff81012200 <intel_pmu_store_lbr+0x70>
ffffffff8969654b: e9 bb bc 97 f7 jmp ffffffff8101220b <intel_pmu_store_lbr+0x7b>
ffffffff89696550: f6 04 25 ea 6e 34 89 testb $0x8,0xffffffff89346eea

so you're basically forcing an immediate thing.

And you wanna get rid of the (%<reg>) relative addressing and force it
to be rip-relative.

Explicitly force gcc to use immediate (rip-relative) addressing for

Right, the rip-relative addressing doesn't happen here:


Indeed it doesn't (egg on my face), nor does it turn out is there currently a way to do so (just adding (%%rip) breaks i386, and there is no equivalent to %{pP} which adds the suffix). Let me fix both; will have a patchset shortly.

-hpa