Re: [RFC][PATCH] make module refcounts use percpu_counters
From: Rusty Russell
Date: Tue Oct 02 2007 - 00:20:58 EST
On Mon, 2007-10-01 at 10:03 -0700, Dave Hansen wrote:
> On Mon, 2007-10-01 at 19:43 +1000, Rusty Russell wrote:
> > On Fri, 2007-09-28 at 16:00 -0700, Dave Hansen wrote:
> > > Module refcounts currently use a percpu counter stored
> > > in the 'struct module'. However, we also have a more
> > > generic implementation that does stuff like handle
> > > hotplug cpus.
> > >
> > > I'm not actually all that convinced that this refcount
> > > actually does a lot of good, with cpus racing bumping
> > > the counters at the same time that they're being
> > > summed up. But, it certainly isn't any worse than
> > > what was there before.
> >
> > That's why we look at the counters inside stop_machine_run().
>
> Ahhh. That makes sense. Although it wasn't apparent during my 3-second
> perusal of the code.
>
> > Note that (1) the module implementation handles hotplug CPUs
>
>
> You're saying it handles hotplug because of stop_machine_run()?
No, it handles hotplug because it does every possible CPU, not every
online CPU. percpu_counters empties cpu's counter presumably to avoid
systematic error.
Renaming percpu_counters to approximate_counters here would be nice.
> > But it might be a useful cleanup (although a slight de-optimization).
> > If you want I'll queue for 2.6.24 (there are several other module
> > patches pending too).
>
> Might as well. It removed a very small amount of code, and opens the
> door a bit for future optimizations in a single place.
You missed removing struct module_ref, too. That's a little more code.
> > In an ideal world, (1) we would have percpu pointers using the same
> > percpu mechanism as percpu variables, (2) we would have a modal variant
> > of percpu counters which would collapse to a single counter when we
> > cared about the precise value (probably using stop_machine for the
> > transition). This would be useful for many other cases.
>
> Yeah, but before we do that, we need some kind of flag to get the
> percpu_counter_mod() fast path shoved into the slow path that takes the
> lock.
Well, there is already a branch in the fast path. BTW, comparing before
and after applying your patch for a try_module_get/module_put pair, I
get 5.9 ns vs 20.6 ns. We perhaps win something back on NUMA-like
machines, but it's not clear.
My initial implementation of such a counter used atomic ops via a
pointer. The pointer was aimed at a shared counter for slow-mode. The
problem is that you need to disable preemption around the counter update
(so you can use rcu to ensure everyone has seen the changeover).
> I'm not sure the stop_machine() mechanism will work very well if we try
> to expand this much further for other users. What would the SGI guys
> think if these happened more than once in a blue moon?
Indeed, that's why I called it "stop_machine". The real-time coders
hate it too.
Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/