Re: WARNING: possible circular locking dependency detected

From: Thomas Gleixner
Date: Thu Aug 31 2017 - 03:56:05 EST


On Thu, 31 Aug 2017, Peter Zijlstra wrote:
> On Thu, Aug 31, 2017 at 09:08:05AM +0200, Thomas Gleixner wrote:
> > On Wed, 30 Aug 2017, Peter Zijlstra wrote:
> > > On offline it basically does perf_event_disable() for all CPU context
> > > events, and then adds HOTPLUG_OFFSET (-32) to arrive at: OFF +
> > > HOTPLUG_OFFSET = -33.
> > >
> > > That's smaller than ERROR and thus perf_event_enable() no-ops on events
> > > for offline CPUs (maybe we should try and plumb an error return for
> > > IOC_ENABLE).
> > >
> > > On online we subtract the HOTPLUG_OFFSET again and the event becomes a
> > > regular OFF, after which perf_event_enable() should work again.
> >
> > I haven't come around to test that as I was busy cleaning up the unholy
> > mess in the watchdog code.
> >
> > One other thing I stumbled over is:
> >
> > perf_event_create()
> > ....
> > x86_hw_reserve(event)
> >
> > if (__x86_pmu_event_init(event) < 0)
> > event->destroy(event);
> > x86_hw_release()
> > ....
> > cpus_read_lock();
> >
> > If that happens from a hotplug function, we are doomed.
> >
> > I mean, that particular watchdog event won't fail if the watchdog code
> > would verify that already at init time (which it does soon), but in general
> > event creation during hotplug is dangerous.
>
> Arghh!!!
>
> And allowing us to create events for offline CPUs (possible I think, but
> maybe slightly tricky) won't solve that, because we're already holding
> the hotplug_lock during PREPARE.

There are two ways to cure that:

1) Have a pre cpus_write_lock() stage which is serialized via
cpus_add_remove_lock, which is the outer lock for hotplug.

There we can sanely create stuff and fail with all consequences.

2) Have some deferred mechanism, which is destroying the event after
failure, but that might be tricky as RCU and workqueues might end up
being flushed during hotplug, which creates the same mess again.

Thanks,

tglx