Re: [PATCH] percpu_rwsem: let percpu_rwsem writer get rwsem faster

From: Waiman Long
Date: Thu Mar 27 2025 - 23:59:08 EST



On 3/27/25 11:05 PM, Cruz Zhao wrote:
In the scenario where a large number of containers are created
at the same time, there will be a lot of tasks created in a
short time, and they will be written into cgroup.procs.

copy_process() will require the cgroup_threadgroup_rwsem read
lock, cgroup_procs_write will require the cgroup_threadgroup_rwsem
write lock. As the readers will pre-increase the read_count and
then check whether there is any writers, resulting that the
writer may be starving, especially when there is a steady stream
of readers.

To alleviate this problem, we add one more check whether there
are writers waiting before increasing the read_count, to make
writers getting lock faster.

Signed-off-by: Cruz Zhao <CruzZhao@xxxxxxxxxxxxxxxxx>
---
kernel/locking/percpu-rwsem.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index 6083883c4fe0..66bf18c28b43 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -47,6 +47,11 @@ EXPORT_SYMBOL_GPL(percpu_free_rwsem);
static bool __percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
{
+ if (unlikely(atomic_read_acquire(&sem->block))) {
+ rcuwait_wake_up(&sem->writer);
+ return false;
+ }
+
this_cpu_inc(*sem->read_count);
/*

The specific sequence of events are there for a reason. If we disturb the sequence like that, there is a possibility that a percpu_up_write() may miss a waiting reader, for example. So a more careful analysis has to be done.

BTW, how much performance benefit did you gain by making this change? We certainly need to see some performance metrics.

The design of percpu rwsem prefers readers more with much less performance overhead than regular rwsem. It also assumes writers come in once in a while. To be more fair to writer, we use rwsem.

Cheers,
Longman