Re: performance anomaly in rep movsq/movsb as seen on Sapphire Rapids executing sync_regs()

Next message: Edgecombe, Rick P: "Re: [PATCH v4 07/16] x86/virt/tdx: Add tdx_alloc/free_page() helpers"
Previous message: Arnd Bergmann: "Re: [PATCH QUESTION] arm64: configs: Add Snapdragon 845 config fragment"
In reply to: Dave Hansen: "Re: performance anomaly in rep movsq/movsb as seen on Sapphire Rapids executing sync_regs()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Dave Hansen

Date: Wed Dec 03 2025 - 16:37:09 EST

On 11/26/25 22:55, Mateusz Guzik wrote:
> I figured movsq still sucks on the uarch, so I patched the kernel to use
> movsb instead, but performance barely budged.
>
> However, forcing the thing to do the copy with regular stores in
> memcpy_orig (32 bytes per loop iteration + 8 bytes tail) unclogs it.

Any chance this can be reproduced in userspace somehow? Does any old
copy of 168 bytes do better with regular stores than rep movsq?