Re: [PATCH] drm/i915: Preallocate mmu notifier to unbreak cpu hotplug deadlock

From: Thomas Gleixner
Date: Thu Oct 05 2017 - 11:23:30 EST

On Thu, 5 Oct 2017, Daniel Vetter wrote:

> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> seems to have uncovered a few more rules about what is allowed and
> isn't.
> This one here seems to indicate that allocating a work-queue while
> holding mmap_sem is a no-go, so let's try to preallocate it.
> Of course another way to break this chain would be somewhere in the
> cpu hotplug code, since this isn't the only trace we're finding now
> which goes through msr_create_device.

That's an interesting multi chain circular dependency which is related to

Now the MSR device is not the only one which is creating that
dependency. We have CPUID and MCE as well. That's what a quick search in
x86 revealed. No idea whether there are more of those interesting bits and

To fix it on the hotplug side we'd have to introduce extra state space
which is handled outside the cpuhotplug_rwsem region, but inside of the
cpu_maps_update_begin()/end() region, which has a nasty pile of
implications vs. the state registration/deregistration as this stuff can be
built as modules. So we'd need a complete set of new interfaces and
handling routines with some explicit restrictions on those state callbacks.

I rather prefer not to go there unless its unavoidable, which brings me to
the obvious question about the stop_machine() usage in the graphics code.

void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);

The function name is telling. The machine is wedged and stop_machine()
might make it even more wedged when looking at this splat :)

The called function name is interesting as well. Is that _BKL postfix a
leftover of the BKL removal a couple of years ago?

Aside of that, is it really required to use stomp_machine() for this
synchronization? We certainly have less intrusive mechansisms than that.