Re: workqueue deadlock

From: Andrew Morton
Date: Sun Dec 10 2006 - 07:17:10 EST


On Sun, 10 Dec 2006 12:49:43 +0100
Ingo Molnar <mingo@xxxxxxx> wrote:

> > > void cpu_hotplug_lock(void)

This is actually not cpu-hotplug safe ;)

> > > {
> > > int cpu = raw_smp_processor_id();
> > > /*
> > > * Interrupts/softirqs are hotplug-safe:
> > > */
> > > if (in_interrupt())
> > > return;
> > > if (current->hotplug_depth++)
> > > return;

<preempt, cpu hot-unplug, resume on different CPU>

> > > current->hotplug_lock = &per_cpu(hotplug_lock, cpu);

<use-after-free>

> > > mutex_lock(current->hotplug_lock);

And it sleeps, so we can't use preempt_disable().

> > > }

It's worth noting that this very common sequence:

preempt_disable();
cpu = smp_processor_id();
...
preempt_enable();

also provides cpu-hotunplug protection against scenarios such as the above.

> > That's functionally equivalent to what we have now, and it isn't
> > working too well.
>
> hm, i thought the main reason of not using cpu_hotplug_lock() in a
> widespread manner was not related to its functionality but to its
> scalability - but i could be wrong.

It hasn't been noticed yet.

I suspect a large part of the reason for that is that we only really care
about hot-unplug when this CPU reaches across to some other CPU's data. Often
_all_ other CPU's data. And that's a super-inefficient thing, so it's rare.

Most of the problems we've had are due to borkage in cpufreq. And that's
simply cruddy code - it's not due to the complexity of CPU hotplug per-se.

> The one above is scalable and we
> could use it as /the/ method to control CPU hotplug. All the flux i
> remember related to cpu_hotplug_lock() use from the fork path and from
> other scheduler hotpaths related to its scalability bottleneck, not to
> its locking efficiency.

One quite different way of addressing all of this is to stop using
stop_machine_run() for hotplug synchronisation and switch to the swsusp
freezer infrastructure: all kernel threads and user processes need to stop
and park themselves in a known state before we allow the CPU to be removed.
lock_cpu_hotplug() becomes a no-op.

Dunno if it'll work - I only just thought of it. It sure would simplify
things.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/