Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value

From: Paul E. McKenney
Date: Wed Jun 29 2022 - 12:52:56 EST


On Tue, Jun 28, 2022 at 04:56:14PM +0000, Joel Fernandes wrote:
> On Mon, Jun 27, 2022 at 02:43:59PM -0700, Paul E. McKenney wrote:
> > On Mon, Jun 27, 2022 at 09:18:13PM +0000, Joel Fernandes wrote:
> > > On Mon, Jun 27, 2022 at 01:59:07PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Jun 27, 2022 at 08:56:43PM +0200, Uladzislau Rezki wrote:
> > > > > > As per the comments in include/linux/shrinker.h, .count_objects callback
> > > > > > should return the number of freeable items, but if there are no objects
> > > > > > to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> > > > > > should be when we are unable to determine the number of objects, or the
> > > > > > cache should be skipped for another reason.
> > > > > >
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > > > > > ---
> > > > > > kernel/rcu/tree.c | 2 +-
> > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 711679d10cbb..935788e8d2d7 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > atomic_set(&krcp->backoff_page_cache_fill, 1);
> > > > > > }
> > > > > >
> > > > > > - return count;
> > > > > > + return count == 0 ? SHRINK_EMPTY : count;
> > > > > > }
> > > > > >
> > > > > > static unsigned long
> > > > > > --
> > > > > > 2.37.0.rc0.104.g0611611a94-goog
> > > > > >
> > > > > Looks good to me!
> > > > >
> > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx>
> > > >
> > > > Now that you mention it, this does look independent of the rest of
> > > > the series. I have pulled it in with Uladzislau's Reviewed-by.
> > >
> > > Thanks Paul and Vlad!
> > >
> > > Paul, apologies for being quiet. I have been working on the series and the
> > > review comments carefully. I appreciate your help with this work.
> >
> > Not a problem. After all, this stuff is changing some of the trickier
> > parts of RCU. We must therefore assume that some significant time and
> > effort will be required to get it right.
>
> To your point about trickier parts of RCU, the v2 series though I tested it
> before submitting is now giving me strange results with rcuscale. Sometimes
> laziness does not seem to be in effect (as pointed out by rcuscale), other
> times I am seeing stalls.
>
> So I have to carefully look through all of this again. I am not sure why I
> was not seeing these issues with the exact same code before (frustrated).

This is one of the mechanisms behind that famous Brian Kerghnihan saying
about code being three times harder to debug than to write. You see,
when you are writing the code, you only need to deal with that part of
the state space that you are aware of. When you are debugging code,
the rest of the state space makes its presence known.

That is, if you are lucky.

If you are not so lucky, the rest of the state space waits to make
its presence known until your code is running some critical workload
in production. ;-)

Thanx, Paul