Re: [PATCH RFC tip/core/rcu] Eliminate deadlock between CPU hotplug and expedited grace periods

From: Paul E. McKenney
Date: Thu Sep 18 2014 - 18:57:52 EST


On Fri, Sep 19, 2014 at 12:55:11AM +0200, Rafael J. Wysocki wrote:
> On Thursday, September 18, 2014 05:38:45 AM Paul E. McKenney wrote:
> > On Thu, Sep 18, 2014 at 03:15:36PM +0800, Lan Tianyu wrote:
> > > On 2014å09æ17æ 21:10, Paul E. McKenney wrote:
> > > > On Wed, Sep 17, 2014 at 03:11:42PM +0800, Lan Tianyu wrote:
> > > >> On 2014å08æ29æ 03:47, Paul E. McKenney wrote:
> > > >>> Currently, the expedited grace-period primitives do get_online_cpus().
> > > >>> This greatly simplifies their implementation, but means that calls to
> > > >>> them holding locks that are acquired by CPU-hotplug notifiers (to say
> > > >>> nothing of calls to these primitives from CPU-hotplug notifiers) can
> > > >>> deadlock. But this is starting to become inconvenient:
> > > >>> https://lkml.org/lkml/2014/8/5/754
> > > >>>
> > > >>> This commit avoids the deadlock and retains the simplicity by creating
> > > >>> a try_get_online_cpus(), which returns false if the get_online_cpus()
> > > >>> reference count could not immediately be incremented. If a call to
> > > >>> try_get_online_cpus() returns true, the expedited primitives operate
> > > >>> as before. If a call returns false, the expedited primitives fall back
> > > >>> to normal grace-period operations. This falling back of course results
> > > >>> in increased grace-period latency, but only during times when CPU
> > > >>> hotplug operations are actually in flight. The effect should therefore
> > > >>> be negligible during normal operation.
> > > >>>
> > > >>> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > > >>> Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> > > >>> Cc: "Rafael J. Wysocki" <rjw@xxxxxxxxxxxxx>
> > > >>> Cc: Lan Tianyu <tianyu.lan@xxxxxxxxx>
> > > >>
> > > >> Hi Paul:
> > > >> What's the status of the patch? Will you push it? Thanks.
> > > >
> > > > By default, it would go into 3.19. Do you need it earlier?
> > >
> > > IMO, this is a dead lock bug which is hard to reproduce and the patch
> > > should go into v3.17 and stable tree?
> >
> > The problem with pushing for v3.17 is that I would have to rebase
> > that commit to the bottom of my current stack and redo all my testing.
> > If there were any problems, I could not only miss v3.17, but also miss
> > the v3.18 merge window.
> >
> > So, given that the next merge window happens pretty soon, how about
> > v3.18 and the stable tree?
>
> That sounds good to me.

Very good, I have added it to my v3.18 queue.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/