Re: [PATCH v3 rcu-dev] rcuperf: Measure memory footprint during kfree_rcu() test

From: Paul E. McKenney
Date: Wed Jan 15 2020 - 19:01:10 EST


On Wed, Jan 15, 2020 at 05:45:42PM -0500, Joel Fernandes wrote:
> On Wed, Jan 15, 2020 at 02:42:51PM -0800, Paul E. McKenney wrote:
> > > [snip]
> > > > > We can certainly refine it further but at this time I am thinking of spending
> > > > > my time reviewing Lai's patches and learning some other RCU things I need to
> > > > > catch up on. If you hate this patch too much, we can also defer this patch
> > > > > review for a bit and I can carry it in my tree for now as it is only a patch
> > > > > to test code. But honestly, in its current form I am sort of happy with it.
> > > >
> > > > OK, I will keep it as is for now and let's look again later on. It is not
> > > > in the bucket for the upcoming merge window in any case, so we do have
> > > > quite a bit of time.
> > > >
> > > > It is not that I hate it, but rather that I want to be able to give
> > > > good answers to questions that might come up. And given that I have
> > > > occasionally given certain people a hard time about their statistics,
> > > > it is only reasonable to expect them to return the favor. I wouldn't
> > > > want you to be caught in the crossfire. ;-)
> > >
> > > Since the weights were concerning, I was thinking of just using a weight of
> > > (1 / N) where N is the number of samples. Essentially taking the average.
> > > That could be simple enough and does not cause your concerns with weight
> > > tuning. I tested it and looks good, I'll post it shortly.
> >
> > YES!!! ;-)
> >
> > Snapshot mem_begin before entering the loop. For the mean value to
> > be solid, you need at least 20-30 samples, which might mean upping the
> > default for kfree_loops. Have an "unsigned long long" to accumulate the
> > sum, which should avoid any possibility of overflow for current systems
> > and for all systems for kfree_loops less than PAGE_SIZE. At which point,
> > forget the "%" stuff and just sum up the si_mem_available() on each pass
> > through the loop.
> >
> > Do the division on exit from the loop, preferably checking for divide
> > by zero.
> >
> > Straightforward, fast, reasonably reliable, and easy to defend.
>
> I mostly did it along these lines. Hopefully the latest posting is reasonable
> enough ;-) I sent it twice because I messed up the authorship (sorry).

No problem with the authorship-fix resend!

But let's get this patch consistent with basic statistics!

You really do need 20-30 samples for an average to mean much.

Of course, right now you default kfree_loops to 10. You are doing
8000 kmalloc()/kfree_rcu() loops on each pass. This is large enough
that just dropping the "% 4" should be just fine from the viewpoint of
si_mem_available() overhead. But 8000 allocations and frees should get
done in way less than one second, so kicking the default kfree_loops up
to 30 should be a non-problem.

Then the patch would be both simpler and statistically valid.

So could you please stop using me as the middleman in your fight with
the laws of mathematics and get this patch to a defensible state? ;-)

Thanx, Paul