On 6/22/2021 5:30 PM, Ben Dooks wrote:
On 19/06/2021 12:21, Akira Tsukamoto wrote:
Optimizing copy_to_user and copy_from_user.
I rewrote the functions in v2, heavily influenced by Garry's memcpy
function [1].
The functions must be written in assembler to handle page faults manually
inside the function.
With the changes, improves in the percentage usage and some performance
of network speed in UDP packets.
Only patching copy_user. Using the original memcpy.
All results are from the same base kernel, same rootfs and same
BeagleV beta board.
Comparison by "perf top -Ue task-clock" while running iperf3.
I did a quick test on a SiFive Unmatched with IO to an NVME.
before: cached-reads=172.47MB/sec, buffered-reads=135.8MB/sec
with-patch: cached-read=s177.54Mb/sec, buffered-reads=137.79MB/sec
That was just one test run, so there was a small improvement. I am
sort of surprised we didn't get more of a win from this.
perf record on hdparm shows that it spends approx 15% cpu time in
asm_copy_to_user. Does anyone have a benchmark for this which just
looks at copy/to user? if not should we create one?
Thanks for the result on the Unmatched with hdparm. Have you tried
iperf3?
The 15% is high, is it before or with-patch?
Akira