Re: [PATCH] x86: add back the alignment of the destination to 8 bytes in copy_user_generic()

From: David Laight
Date: Wed Mar 19 2025 - 09:07:53 EST

Next message: Christian Brauner: "Re: [PATCH] fs: predict not reaching the limit in alloc_empty_file()"
Previous message: Theodore Ts'o: "Re: [PATCH] ext4: fix OOB read when checking dotdot dir"
In reply to: Herton Krzesinski: "Re: [PATCH] x86: add back the alignment of the destination to 8 bytes in copy_user_generic()"
Next in thread: David Laight: "Re: [PATCH] x86: add back the alignment of the destination to 8 bytes in copy_user_generic()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 18 Mar 2025 19:50:41 -0300
Herton Krzesinski <hkrzesin@xxxxxxxxxx> wrote:

> On Tue, Mar 18, 2025 at 6:59 PM David Laight
> <david.laight.linux@xxxxxxxxx> wrote:
...
> For Intel, I was looking and looks like after Sandy Bridge based CPUs
> most/almost all have ERMS, and FSRM is something only newer ones have.
> So the way back to Ivy Bridge is ERMS and not FSRM.

ERMS behaves much the same as FSRM.
The cost of the first tranfser is a few clocks higher (maybe 30 not 24),
and (IIRC) the overhead for the next couple of blocks is a bit bigger.
Reading Agner's tables (again) Haswell will do 32 bytes/clock
(for an aligned destination) whereas Sandy/Ivy bridge 'only' do 16..
I doubt it is enough to treat them differently.

The real issue with using (aligned) 'rep movsq' was the 140 clock
setup cost on P4 netburst (and no one cares about that and more).
I don't think anything else really needs an open coded loop.
There is no hint in the tables of the AND cpu (going way back)
having long setup times.

The differing cost of different ways of aligning the copy will show
up most on short copies.
You also need to benchmarks differing sizes/alignments - otherwise the branch
predictor will get it right every time - which it doesn't in 'real code'.

David

Next message: Christian Brauner: "Re: [PATCH] fs: predict not reaching the limit in alloc_empty_file()"
Previous message: Theodore Ts'o: "Re: [PATCH] ext4: fix OOB read when checking dotdot dir"
In reply to: Herton Krzesinski: "Re: [PATCH] x86: add back the alignment of the destination to 8 bytes in copy_user_generic()"
Next in thread: David Laight: "Re: [PATCH] x86: add back the alignment of the destination to 8 bytes in copy_user_generic()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]