Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if theyexist

From: Russell King - ARM Linux
Date: Thu Jan 05 2012 - 09:41:03 EST


On Thu, Jan 05, 2012 at 02:20:17PM +0000, Mel Gorman wrote:
> On Tue, Jan 03, 2012 at 12:45:45PM -0500, KOSAKI Motohiro wrote:
> > > void drain_all_pages(void)
> > > {
> > > - on_each_cpu(drain_local_pages, NULL, 1);
> > > + int cpu;
> > > + struct per_cpu_pageset *pcp;
> > > + struct zone *zone;
> > > +
> >
> > get_online_cpu() ?
> >
>
> Just a separate note;
>
> I'm looking at some mysterious CPU hotplug problems that only happen
> under heavy load. My strongest suspicion at the moment that the problem
> is related to on_each_cpu() being used without get_online_cpu() but you
> cannot simply call get_online_cpu() in this path without causing
> deadlock.

Mel,

That's a known hotplug problems. PeterZ has a patch which (probably)
solves it, but there seems to be very little traction of any kind to
merge it. I've been chasing that patch and getting no replies what so
ever from folk like Peter, Thomas and Ingo.

The problem affects all IPI-raising functions, which mask with
cpu_online_mask directly.

I'm not sure that smp_call_function() can use get_online_cpu() as it
looks like it's not permitted to sleep (it spins in csd_lock_wait if
it is to wait for the called function to complete on all CPUs,
rather than using a sleepable completion.) get_online_cpu() solves
the online mask problem by sleeping until it's safe to access it.

So, I think this whole CPU bringup mess needs to be re-thought, and
the seemingly constant to pile more and more restrictions onto the
bringup path needs resolving. It's got to the point where there's
soo many restrictions that actually it's impossible for arch code to
simultaneously satisfy them all.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/