Re: [PATCH V2 7/7] x86,rcu: use percpu rcu_preempt_depth

From: Lai Jiangshan
Date: Mon Nov 04 2019 - 06:41:32 EST




On 2019/11/4 5:25 äå, Sebastian Andrzej Siewior wrote:
On 2019-11-02 12:45:59 [+0000], Lai Jiangshan wrote:
Convert x86 to use a per-cpu rcu_preempt_depth. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
task_struct or thread_info variables.

Is there a benchmark saying how much we gain from this?

Hello

Maybe I can write a tight loop for testing, but I don't
think anyone will be interesting in it.

I'm also trying to find some good real tests. I need
some suggestions here.


We need to save/restore the actual rcu_preempt_depth when switch.
We also place the per-cpu rcu_preempt_depth close to __preempt_count
and current_task variable.

Using the idea of per-cpu __preempt_count.

No function call when using rcu_read_[un]lock().
Single instruction for rcu_read_lock().
2 instructions for fast path of rcu_read_unlock().

I think these were not inlined due to the header requirements.

objdump -D -S kernel/workqueue.o shows (selected fractions):


raw_cpu_add_4(__rcu_preempt_depth, 1);
d8f: 65 ff 05 00 00 00 00 incl %gs:0x0(%rip) # d96 <work_busy+0x16>

......


return GEN_UNARY_RMWcc("decl", __rcu_preempt_depth, e, __percpu_arg([var]));
dd8: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # ddf <work_busy+0x5f>
if (unlikely(rcu_preempt_depth_dec_and_test()))
ddf: 74 26 je e07 <work_busy+0x87>

......

rcu_read_unlock_special();
e07: e8 00 00 00 00 callq e0c <work_busy+0x8c>


Boris pointed one thing, there is also DEFINE_PERCPU_RCU_PREEMP_DEPTH.


Thanks for pointing out.

Best regards
Lai