Re: [PATCH 5/8] cpu: cpu-hotplug deadlock

From: Gautham R Shenoy
Date: Wed Apr 30 2008 - 01:37:28 EST


On Tue, Apr 29, 2008 at 06:33:50PM +0400, Oleg Nesterov wrote:
> On 04/29, Gautham R Shenoy wrote:
> >
> > cpu_hotplug.mutex is basically a lock-internal lock; but by keeping it locked
> > over the 'write' section (cpu_hotplug_begin/done) a lock inversion happens when
> > some of the write side code calls into code that would otherwise take a
> > read lock.
> >
> > And it so happens that read-in-write recursion is expressly permitted.
> >
> > Fix this by turning cpu_hotplug into a proper stand alone unfair reader/writer
> > lock that allows reader-in-reader and reader-in-writer recursion.
>
> While the patch itself is very clean and understandable, I can't understand
> the changelog ;)
>
> Could you explain what is the semantics difference? The current code allows
> read-in-write recursion too.

How about the following ?
----------------------------------------------------------------------------
cpu_hotplug: split the cpu_hotplug.lock mutex into a spinlock and a waitqueue.

Consider the following callpath:
get_online_cpus()
.
.
.
some_fn()--> takes some_lock; /* lock_acquire(some_lock) [1] */
.
.
.
get_online_cpus(); /* lock_acquire(cpu_hotplug.lock) [2] */

and on the cpu_hotplug callback path, we have
cpu_hotplug.lock /* lock_acquire(cpu_hotplug.lock) [3]*/
.
.
.
some_other_fn() ----> take some_lock /* lock_acquire(some_lock) [4]*/

lockdep will treat this as a ABBA possiblity since on the write path we
currently hold the lock so that the readers are blocked on it.

However, the write path won't be activated until the refcount goes
to zero. Which means that lockdep yelling at [2] is a false positive.

Hence we need to split the mutex into a seperate spin_lock under which
the refcount can be incremented, and introduce a waitqueue where the readers
can block during a ongoing cpu-hotplug operation, and provide lockdep
annotation only when the reader is about to block.
----------------------------------------------------------------------------

>
> The only difference I can see is that now cpu_hotplug_begin() doesn't rely
> on cpu_add_remove_lock any longer (currently the caller must hold this lock),
> but this (good) change is not documented.
>
> > static void cpu_hotplug_done(void)
> > {
> > + spin_lock(&cpu_hotplug.lock);
> > cpu_hotplug.active_writer = NULL;
> > - mutex_unlock(&cpu_hotplug.lock);
> > + if (!list_empty(&cpu_hotplug.writer_queue.task_list))
>
> waitqueue_active() ?
>
> > + wake_up(&cpu_hotplug.writer_queue);
> > + else
> > + wake_up_all(&cpu_hotplug.reader_queue);
>
> Please note that wake_up() and wake_up_all() doesn't differ here, because
> cpu_hotplug_begin() use prepare_to_wait(), not prepare_to_wait_exclusive().
> I'd suggest to change cpu_hotplug_begin(), and use wake_up() for both
> cases.
>
> (actually, since write-locks should be very rare, perhaps we don't need
> 2 wait_queues ?)
>
> Oleg.

--
Thanks and Regards
gautham
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/