Re: performance anomaly in rep movsq/movsb as seen on Sapphire Rapids executing sync_regs()

From: Dave Hansen

Date: Wed Dec 03 2025 - 16:37:09 EST


On 11/26/25 22:55, Mateusz Guzik wrote:
> I figured movsq still sucks on the uarch, so I patched the kernel to use
> movsb instead, but performance barely budged.
>
> However, forcing the thing to do the copy with regular stores in
> memcpy_orig (32 bytes per loop iteration + 8 bytes tail) unclogs it.

Any chance this can be reproduced in userspace somehow? Does any old
copy of 168 bytes do better with regular stores than rep movsq?