Re: [PATCH net-next v5 01/13] mm: page_frag: add a test module for page_frag

From: Yunsheng Lin
Date: Fri May 31 2024 - 04:50:49 EST


On 2024/5/30 23:16, Jakub Kicinski wrote:
> On Thu, 30 May 2024 17:17:17 +0800 Yunsheng Lin wrote:
>>> Is this test actually meaningfully testing page_frag or rather
>>> the objpool construct and the scheduler? :S
>>
>> For the objpool part, I guess it is ok to say that it is a
>> meaningfully testing for both page_frag and objpool if there is
>> changing to either of them.
>
> Why guess when you can measure it.
> Slow one down and see if it impacts the benchmark.

Before the slowing down on arm64 system:

Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17' (500 runs):

19.420606 task-clock (msec) # 0.001 CPUs utilized ( +- 0.82% )
7 context-switches # 0.377 K/sec ( +- 0.30% )
1 cpu-migrations # 0.038 K/sec ( +- 2.82% )
84 page-faults # 0.004 M/sec ( +- 0.06% )
50423999 cycles # 2.596 GHz ( +- 0.82% )
35558295 instructions # 0.71 insn per cycle ( +- 0.09% )
8340405 branches # 429.462 M/sec ( +- 0.08% )
20669 branch-misses # 0.25% of all branches ( +- 0.10% )

24.047641626 seconds time elapsed ( +- 0.08% )


And there are 5120000 push and pop operations for each iteration,
so roughly each push and pop operation costs about 4687ns.

By adding 50ns delay in *__page_frag_alloc_va_align()
@@ -300,6 +297,8 @@ void *__page_frag_alloc_va_align(struct page_frag_cache *nc,
{
unsigned int remaining = nc->remaining & align_mask;

+ ndelay(50);
+
if (unlikely(fragsz > remaining)) {


We have:
Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17' (500 runs):

18.012657 task-clock (msec) # 0.001 CPUs utilized ( +- 0.01% )
7 context-switches # 0.395 K/sec ( +- 0.20% )
1 cpu-migrations # 0.052 K/sec ( +- 1.35% )
84 page-faults # 0.005 M/sec ( +- 0.06% )
46765406 cycles # 2.596 GHz ( +- 0.01% )
35253336 instructions # 0.75 insn per cycle ( +- 0.00% )
8277063 branches # 459.514 M/sec ( +- 0.00% )
20558 branch-misses # 0.25% of all branches ( +- 0.07% )

24.313647557 seconds time elapsed ( +- 0.07% )


(24.313647557 - 24.047641626) * 1000000000 / 5120000 = 51ns, so the
testing seems correct.

>
>> For the scheduler part, this test provides the below module param
>> to avoid the the noise from scheduler.
>>
>> +static int test_push_cpu;
>> +module_param(test_push_cpu, int, 0600);
>> +MODULE_PARM_DESC(test_push_cpu, "test cpu for pushing fragment");
>> +
>> +static int test_pop_cpu;
>> +module_param(test_pop_cpu, int, 0600);
>> +MODULE_PARM_DESC(test_pop_cpu, "test cpu for popping fragment");
>>
>> Or is there any better idea for testing page_frag?
>
> .
>