[PATCH v2 rcu-dev 3/3] rcu/tree: Count number of batched kfree_rcu() locklessly

From: Joel Fernandes (Google)
Date: Mon Mar 16 2020 - 12:32:44 EST


We can relax the correctness of counting of number of queued objects in
favor of not hurting performance, by locklessly sampling per-cpu
counters. This should be Ok since under high memory pressure, it should not
matter if we are off by a few objects while counting. The shrinker will
still do the reclaim.

Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>

---
kernel/rcu/tree.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index dc570dff68d7b..875e7162ddcce 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2916,7 +2916,7 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
krcp->head = NULL;
}

- krcp->count = 0;
+ WRITE_ONCE(krcp->count, 0);

/*
* One work is per one batch, so there are two "free channels",
@@ -3054,7 +3054,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
krcp->head = head;
}

- krcp->count++;
+ WRITE_ONCE(krcp->count, krcp->count + 1);

// Set timer to drain after KFREE_DRAIN_JIFFIES.
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
@@ -3080,9 +3080,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
for_each_online_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);

- spin_lock_irqsave(&krcp->lock, flags);
- count += krcp->count;
- spin_unlock_irqrestore(&krcp->lock, flags);
+ count += READ_ONCE(krcp->count);
}

return count;
--
2.25.1.481.gfbce0eb801-goog