Re: [PATCH net-next v25 00/13] Device Memory TCP

From: Yunsheng Lin
Date: Mon Sep 09 2024 - 07:23:52 EST


On 2024/9/9 13:43, Mina Almasry wrote:

>
> Perf - page-pool benchmark:
> ---------------------------
>
> bench_page_pool_simple.ko tests with and without these changes:
> https://pastebin.com/raw/ncHDwAbn
>
> AFAIK the number that really matters in the perf tests is the
> 'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
> cycles without the changes but there is some 1 cycle noise in some
> results.
>
> With the patches this regresses to 9 cycles with the changes but there
> is 1 cycle noise occasionally running this test repeatedly.
>
> Lastly I tried disable the static_branch_unlikely() in
> netmem_is_net_iov() check. To my surprise disabling the
> static_branch_unlikely() check reduces the fast path back to 8 cycles,
> but the 1 cycle noise remains.

Sorry for the late report, as I was adding a testing page_pool ko basing
on [1] to avoid introducing performance regression when fixing the bug in
[2].
I used it to test the performance impact of devmem patchset for page_pool
too, it seems there might be some noticable performance impact quite stably
for the below testcases, about 5%~16% performance degradation as below in
the arm64 system:

Before the devmem patchset:
Performance counter stats for 'insmod ./page_pool_test.ko test_push_cpu=16 test_pop_cpu=16 nr_test=100000000 test_napi=1' (100 runs):

17.167561 task-clock (msec) # 0.003 CPUs utilized ( +- 0.40% )
8 context-switches # 0.474 K/sec ( +- 0.65% )
0 cpu-migrations # 0.001 K/sec ( +-100.00% )
84 page-faults # 0.005 M/sec ( +- 0.13% )
44576552 cycles # 2.597 GHz ( +- 0.40% )
59627412 instructions # 1.34 insn per cycle ( +- 0.03% )
14370325 branches # 837.063 M/sec ( +- 0.02% )
21902 branch-misses # 0.15% of all branches ( +- 0.27% )

6.818873600 seconds time elapsed ( +- 0.02% )

Performance counter stats for 'insmod ./page_pool_test.ko test_push_cpu=16 test_pop_cpu=16 nr_test=100000000 test_napi=1 test_direct=1' (100 runs):

17.595423 task-clock (msec) # 0.004 CPUs utilized ( +- 0.01% )
8 context-switches # 0.460 K/sec ( +- 0.50% )
0 cpu-migrations # 0.000 K/sec
84 page-faults # 0.005 M/sec ( +- 0.15% )
45693020 cycles # 2.597 GHz ( +- 0.01% )
59676212 instructions # 1.31 insn per cycle ( +- 0.00% )
14385384 branches # 817.564 M/sec ( +- 0.00% )
21786 branch-misses # 0.15% of all branches ( +- 0.14% )

4.098627802 seconds time elapsed ( +- 0.11% )

After the devmem patchset:
Performance counter stats for 'insmod ./page_pool_test.ko test_push_cpu=16 test_pop_cpu=16 nr_test=100000000 test_napi=1' (100 runs):

17.047973 task-clock (msec) # 0.002 CPUs utilized ( +- 0.39% )
8 context-switches # 0.488 K/sec ( +- 0.82% )
0 cpu-migrations # 0.001 K/sec ( +- 70.35% )
84 page-faults # 0.005 M/sec ( +- 0.12% )
44269558 cycles # 2.597 GHz ( +- 0.39% )
59594383 instructions # 1.35 insn per cycle ( +- 0.02% )
14362599 branches # 842.481 M/sec ( +- 0.02% )
21949 branch-misses # 0.15% of all branches ( +- 0.25% )

7.964890303 seconds time elapsed ( +- 0.16% )

Performance counter stats for 'insmod ./page_pool_test.ko test_push_cpu=16 test_pop_cpu=16 nr_test=100000000 test_napi=1 test_direct=1' (100 runs):

17.660975 task-clock (msec) # 0.004 CPUs utilized ( +- 0.02% )
8 context-switches # 0.458 K/sec ( +- 0.57% )
0 cpu-migrations # 0.003 K/sec ( +- 43.81% )
84 page-faults # 0.005 M/sec ( +- 0.17% )
45862652 cycles # 2.597 GHz ( +- 0.02% )
59764866 instructions # 1.30 insn per cycle ( +- 0.01% )
14404323 branches # 815.602 M/sec ( +- 0.01% )
21826 branch-misses # 0.15% of all branches ( +- 0.19% )

4.304644609 seconds time elapsed ( +- 0.75% )

1. https://lore.kernel.org/all/20240906073646.2930809-2-linyunsheng@xxxxxxxxxx/
2. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@xxxxxxxxxx/T/

>