Re: [PATCH] rcu: shrink each possible cpu krcp

From: Paul E. McKenney
Date: Tue Aug 18 2020 - 13:18:21 EST


On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
> >
> > > From: Zqiang <qiang.zhang@xxxxxxxxxxxxx>
> > >
> > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > func, if the shrinker is triggered at this time, we should drain each
> > > possible cpu "krcp".
> > >
> > > Signed-off-by: Zqiang <qiang.zhang@xxxxxxxxxxxxx>
> > > ---
> > > kernel/rcu/tree.c | 6 +++---
> > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > unsigned long count = 0;
> > >
> > > /* Snapshot count of all CPUs */
> > > - for_each_online_cpu(cpu) {
> > > + for_each_possible_cpu(cpu) {
> > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > >
> > > count += READ_ONCE(krcp->count);
> > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > int cpu, freed = 0;
> > > unsigned long flags;
> > >
> > > - for_each_online_cpu(cpu) {
> > > + for_each_possible_cpu(cpu) {
> > > int count;
> > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > >
> > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > int cpu;
> > > unsigned long flags;
> > >
> > > - for_each_online_cpu(cpu) {
> > > + for_each_possible_cpu(cpu) {
> > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > >
> > > raw_spin_lock_irqsave(&krcp->lock, flags);
> > >
> > I agree that it can happen.
> >
> > Joel, what is your view?
>
> Yes I also think it is possible. The patch LGTM. Another fix could be
> to drain the caches in the CPU offline path and save the memory. But
> then it will take hit during __get_free_page(). If CPU
> offlining/online is not frequent, then it will save the lost memory.
>
> I wonder how other per-cpu caches in the kernel work in such scenarios.
>
> Thoughts?

Do I count this as an ack or a review? If not, what precisely would
you like the submitter to do differently?

Thanx, Paul