Re: [PATCH v3] CPU hotplug: active_writer not woken up in some cases - deadlock

From: David Hildenbrand
Date: Wed Dec 10 2014 - 02:56:42 EST


> (sorry if this was already discussed, I ignored most of my emails
> I got this week)
>
> On 12/09, David Hildenbrand wrote:
> >
> > @@ -116,7 +118,13 @@ void put_online_cpus(void)
> > if (cpu_hotplug.active_writer == current)
> > return;
> > if (!mutex_trylock(&cpu_hotplug.lock)) {
> > + /* inc before testing for active_writer to not lose wake ups */
> > atomic_inc(&cpu_hotplug.puts_pending);
> > + spin_lock(&cpu_hotplug.awr_lock);
> > + /* we might be the last one */
> > + if (unlikely(cpu_hotplug.active_writer))
> > + wake_up_process(cpu_hotplug.active_writer);
> > + spin_unlock(&cpu_hotplug.awr_lock);
>
> Not sure I understand. awr_lock can only ensure that active_writer
> can't go away.

This solution is not optimal but works without races ... I'll try to get
something with wait queues running and/or even change the way refcount is
accessed as suggested by you.

And yes, awr_lock will only ensure that active_writer won't go away.

>
> Why active_writer should see .puts_pending != 0 if this is called
> right after cpu_hotplug_begin() takes cpu_hotplug.lock but before
> it sets TASK_UNINTERRUPTIBLE?

get_online_cpus() increased the refcount.
put_online_cpus() will increment puts_pending and trigger a wake up (if the
lock is alread taken - might be by cpu_hotplug_begin() or by some other
get_online_cpus()).

So refcount == 1, puts_pending == 1

cpu_hotplug_begin() gets the lock and sees refcount == 1 and puts_pending == 0
or puts_pending == 1 (race with put_online_cpus()).

If that answers your question :)

>
> IOW,
>
> > void cpu_hotplug_begin(void)
> > {
> > + spin_lock(&cpu_hotplug.awr_lock);
> > cpu_hotplug.active_writer = current;
> > + spin_unlock(&cpu_hotplug.awr_lock);
> >
> > cpuhp_lock_acquire();
> > for (;;) {
> > mutex_lock(&cpu_hotplug.lock);
> > + __set_current_state(TASK_UNINTERRUPTIBLE);
>
> don't we need set_current_state() here ?

Hm, good question, this was only a move of existing code. But I thing the
checked variant would be better.

>
> Oleg.
>

Thanks!

David

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/