Performance issues in copy_user_generic() in x86_64

From: Herton R. Krzesinski
Date: Fri Mar 14 2025 - 13:53:55 EST


Hello,

recently I have got two reports of performance loss in copy_user_generic()
after updates in user copy functions in x86_64, when benchmarking with iperf3.
I believe the write alignment to 8 bytes that was done through the old
ALIGN_DESTINATION macro was helping in some cases, and when it was removed the
performance drop can be noticed. Looks like this theory is corroborated by some
performance testing I did.

Please take a look at the following email with the patch if everything is sane.
I already did some testing as explained in the changelog of the patch. I used
the following scripts to run the testing, I just wrote them to get the job done
and get some results, so there is nothing fancy about them.

---- bench.sh
#!/bin/bash

dir=$1
mkdir -p $dir

for cpu in 19 21 23 none; do
sync
echo 3 > /proc/sys/vm/drop_caches
cpu_opt=""
if [ "$cpu" != "none" ]; then
cpu_opt="taskset -c $cpu"
fi
$cpu_opt iperf3 -D -s -B 127.0.0.1 -p 12000
perf stat -o $dir/stat.$cpu.txt taskset -c 17 iperf3 -c 127.0.0.1 -b 0/1000 -V -n 50G --repeating-payload -l 16384 -p 12000 --cport 12001 2>&1 > $dir/stat-$cpu.txt
cat $dir/stat.$cpu.txt >> $dir/stat-$cpu.txt
rm -f $dir/stat.$cpu.txt
killall iperf3
done
----

---- stat.sh
#!/bin/bash

dir=$1
printf " %4s %13s %12s %12s %11s\n" "CPU" "RATE " "SYS " "TIME " "sender-receiver"

for cpu in 19 21 23 none; do
time=$(grep 'seconds time elapsed' $dir/stat-$cpu.txt | awk '{ print $1 }')
sys=$(grep 'seconds sys' $dir/stat-$cpu.txt | awk '{ print $1 }')
rate=$(grep ' sender' $dir/stat-$cpu.txt | awk '{ print $7 $8 }')
cpuu=$(grep 'CPU Utilization' $dir/stat-$cpu.txt | awk '{ printf "%s-%s\n", $4, $7 }')

printf "Server bind %4s: $rate $sys $time %s\n" $cpu $cpuu
done
----

Example of a test run:
nice -n -20 ./bench.sh align
./stat.sh align