Re: [PATCH V2 00/11] rcu/x86: Use per-cpu rcu preempt count

From: Paul E. McKenney
Date: Mon May 20 2024 - 16:03:24 EST


On Sun, Apr 07, 2024 at 05:05:47PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@xxxxxxxxxxxx>
>
>
> Changed from v1:
> Merge thunk_64.S and thunk_32.S into thunk.S
> Add missing #ifdef in arch/x86/kernel/cpu/common.c
>
> X86 can access percpu data in a single instruction.
>
> Use per-cpu rcu preempt count and make it able to be inlined.
>
> patch 1-8: prepare
> patch 9-11: implement PCPU_RCU_PREEMPT_COUNT
>
> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Frederic Weisbecker <frederic@xxxxxxxxxx>

Hello, Lai!

This is really cool stuff, thank you!!!

Two big questions remain: (1) What system-level net performance benefit
is there, taking the increased context-switch overhead into account and
(2) Are the scheduler maintainers on board with these changes?

On #1, I do well recall your ca. 2019 points about the improved code
generated, but I have seen cases where improved code actually ran
more slowly. My guess is that you have the best chance of seeing
system-level benefits on low-end x86 platforms, perhaps the old Atom
or Celeron systems. The rcuref module provides a good way of doing
microbenchmarks, which would be a good start. Other metrics that
might help include overall kernel code size.

On #2, good data for #1 would help greatly.

Thoughts?

Thanx, Paul

> Lai Jiangshan (11):
> lib: Use rcu_preempt_depth() to replace current->rcu_read_lock_nesting
> rcu: Move rcu_preempt_depth_set() to rcupdate.h
> rcu: Reorder tree_exp.h after tree_plugin.h
> rcu: Add macros set_rcu_preempt_special() and
> clear_rcu_preempt_special()
> rcu: Make rcu_read_unlock_special() global
> rcu: Rename marco __LINUX_RCU_H to __KERNEL_RCU_H
> sched/core: Add rcu_preempt_switch()
> x86/entry: Merge thunk_64.S and thunk_32.S into thunk.S
> rcu: Implement PCPU_RCU_PREEMPT_COUNT framework
> x86/rcu: Add rcu_preempt_count
> x86/rcu: Add THUNK rcu_read_unlock_special_thunk
>
> arch/x86/Kconfig | 1 +
> arch/x86/entry/Makefile | 2 +-
> arch/x86/entry/{thunk_64.S => thunk.S} | 5 ++
> arch/x86/entry/thunk_32.S | 18 ----
> arch/x86/include/asm/current.h | 3 +
> arch/x86/include/asm/rcu_preempt.h | 109 +++++++++++++++++++++++++
> arch/x86/kernel/cpu/common.c | 4 +
> include/linux/rcupdate.h | 36 ++++++++
> kernel/rcu/Kconfig | 8 ++
> kernel/rcu/rcu.h | 15 +++-
> kernel/rcu/tree.c | 2 +-
> kernel/rcu/tree_exp.h | 2 +-
> kernel/rcu/tree_plugin.h | 41 ++++++----
> kernel/sched/core.c | 2 +
> lib/locking-selftest.c | 6 +-
> 15 files changed, 212 insertions(+), 42 deletions(-)
> rename arch/x86/entry/{thunk_64.S => thunk.S} (72%)
> delete mode 100644 arch/x86/entry/thunk_32.S
> create mode 100644 arch/x86/include/asm/rcu_preempt.h
>
>
> base-commit: f2f80ac809875855ac843f9e5e7480604b5cbff5
> --
> 2.19.1.6.gb485710b
>