Re: [PATCH] x86/asm: Use asm_inline() instead of asm() in amd_clear_divider()
From: Borislav Petkov
Date: Fri Mar 14 2025 - 09:23:46 EST
On Fri, Mar 14, 2025 at 11:17:43AM +0100, Ingo Molnar wrote:
> Here's a link for those who'd like to view this via the web:
>
> https://lore.kernel.org/all/174188884263.14745.1542926632284353047.tip-bot2@tip-bot2/
This is a perf measuring method I got from you, actually, from a long time
ago:
:-)
./tools/perf/perf stat -a --repeat 5 --sync --pre ~/bin/pre-build-kernel.sh -- make -s -j33 bzImage
* tip/master fdebf9c0efe4 ("Merge branch into tip/master: 'x86/sev'")
Performance counter stats for 'system wide' (5 runs):
4,144,101.54 msec cpu-clock # 32.000 CPUs utilized ( +- 0.10% )
812,478 context-switches # 196.056 /sec ( +- 0.15% )
67,201 cpu-migrations # 16.216 /sec ( +- 0.22% )
48,228,560 page-faults # 11.638 K/sec ( +- 0.01% )
9,473,229,339,058 instructions # 1.12 insn per cycle
# 0.21 stalled cycles per insn ( +- 0.00% )
8,476,070,185,458 cycles # 2.045 GHz ( +- 0.12% )
1,988,775,653,131 stalled-cycles-frontend # 23.46% frontend cycles idle ( +- 0.14% )
2,128,585,400,027 branches # 513.642 M/sec ( +- 0.00% )
66,681,861,375 branch-misses # 3.13% of all branches ( +- 0.03% )
129.504 +- 0.127 seconds time elapsed ( +- 0.10% )
* tip/master with 9628d19e91f1 reverted
Performance counter stats for 'system wide' (5 runs):
4,141,057.45 msec cpu-clock # 32.000 CPUs utilized ( +- 0.15% )
811,299 context-switches # 195.916 /sec ( +- 0.08% )
67,644 cpu-migrations # 16.335 /sec ( +- 0.24% )
48,209,829 page-faults # 11.642 K/sec ( +- 0.00% )
9,465,299,000,193 instructions # 1.12 insn per cycle
# 0.21 stalled cycles per insn ( +- 0.00% )
8,487,239,564,102 cycles # 2.050 GHz ( +- 0.21% )
1,992,414,836,889 stalled-cycles-frontend # 23.48% frontend cycles idle ( +- 0.08% )
2,127,019,426,911 branches # 513.642 M/sec ( +- 0.00% )
66,698,031,504 branch-misses # 3.14% of all branches ( +- 0.02% )
129.408 +- 0.195 seconds time elapsed ( +- 0.15% )
This is all within the noise.
Or maybe building the kernel even with those "optimized" inlining decisions
due the asm being of length 1 for atomic locking insns simply doesn't matter.
Or maybe I need a different benchmark.
At least it ain't breaking anything...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette