Re: [RFC-PATCH 2/4] mm: Add __rcu_alloc_page_lockless() func.

From: Uladzislau Rezki
Date: Wed Sep 23 2020 - 07:27:28 EST


> > > Other approaches under consideration include making CONFIG_PREEMPT_COUNT
> > > unconditional and thus allowing call_rcu() and kvfree_rcu() to determine
> > > whether direct calls to the allocator are safe (some guy named Linus
> > > doesn't like this one),
> >
> > I assume that the primary argument is the overhead, right? Do you happen
> > to have any reference?
>
> Jon Corbet wrote a very nice article summarizing the current situation:
> https://lwn.net/Articles/831678/. Thomas's measurements show no visible
> system-level performance impact. I will let Uladzislau present his more
> microbenchmarky performance work.
>
I have done some analysis of the !PREEMPT kernel with and without PREEMPT_COUNT
configuration. The aim is to show a performance impact if the PREEMPT_COUNT is
unconditionally enabled.

As for the test i used the refscale kernel module, that does:

<snip>
static void ref_rcu_read_section(const int nloops)
{
int i;

for (i = nloops; i >= 0; i--) {
rcu_read_lock();
rcu_read_unlock();
}
}
<snip>

How to run the microbenchmark:

<snip>
urezki@pc638:~$ sudo modprobe refscale
<snip>

The below is an average duration per loop (nanoseconds):

!PREEMPT_COUNT PREEMPT_COUNT
Runs Time(ns) Runc Time(ns)
1 109.640 1 99.915
2 102.303 2 111.106
3 90.520 3 98.713
4 106.347 4 111.239
5 108.374 5 111.797
6 108.012 6 111.558
7 103.989 7 113.122
8 106.194 8 111.515
9 107.330 9 107.559
10 105.877 10 105.965
11 104.860 11 104.835
12 104.299 12 106.342
13 104.794 13 106.664
14 104.916 14 104.914
15 105.485 15 104.280
16 104.610 16 105.642
17 104.981 17 105.646
18 103.089 18 106.370
19 105.251 19 105.284
20 104.133 20 105.973
21 105.589 21 105.271
22 104.154 22 106.063
23 104.963 23 106.248
24 102.431 24 105.568
25 102.610 25 105.556
26 103.474 26 105.655
27 100.194 27 102.887
28 102.340 28 104.347
29 102.075 29 102.389
30 102.808 30 103.123

The difference is ~1.8% in average. The maximum value is 109.640 vs 113.122
The minimum value is 90.520 vs 98.713.

Tested on:
processor : 63
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : QEMU Virtual CPU version 2.5+
cpu MHz : 3700.204

I also can do more detailed testing using "perf" tool.

--
Vlad Rezki