Hi!
From: "Borislav Petkov (AMD)" <bp@xxxxxxxxx>...
[ Upstream commit 04c3024560d3a14acd18d0a51a1d0a89d29b7eb5 ]
AMD does not have the requirement for a synchronization barrier when
acccessing a certain group of MSRs. Do not incur that unnecessary
penalty there.
Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option
with and without weak_wrmsr_fence() on a Zen4 system also showed significant
performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores
CCX or CCD and just picks random vCPU.
Average throughput (10 iterations) with weak_wrmsr_fence(),
Cumulative throughput: 4933374 IPI/s
Average throughput (10 iterations) without weak_wrmsr_fence(),
Cumulative throughput: 6355156 IPI/s
[1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench
Speed improvement, not a bugfix. Please drop.