Re: drm_vm.c:drm_mmap: possible circular locking dependency detected

From: Tejun Heo
Date: Sun Jan 03 2010 - 00:10:09 EST


Hello,

On 01/03/2010 11:06 AM, Eric W. Biederman wrote:
> Removed driver hardware isn't something sysfs can really guard
> against, although it can help to make the window of vulnerability
> smaller.

It can't protect against removal itself per-se but it does give the
driver a boundary which it can depend on while implementing hot
unplugging. Hardwares which support hot unplugging can cope with
surprise removal and has mechanisms to detect and handle them but
software part still is tricky and driver needs to have a boundary
after which it can declare a device gone.

> Protecting driver internal data structures if we can does
> seem reasonable.

Also the case of driver detaching (and another driver attaching).

> The case I was thinking of in particular is when someone does:
> "rmmod driver" I think device_del protects from the code going away
> today.

Nope, that's protected by reference counting via fops and/or other
stuff.

>> If such separation is necessary, we can implement the split interface
>> while leaving kobject_del() as is feature-wise and convert the
>> offending ones to use the split interface but I think it would be
>> better to simply fix the offending ones if there aren't too many and
>> they're easily fixable. Let's see how many lockdep warnings turn up.
>
> - We have the network stack.
> I have hacked around that (when I thought it was a singleton)
> by introducing the idiom:
>
> if (!rtnl_trylock())
> return restart_sysscall();
>
> But that isn't sustainable, as there is already one new entry that
> just does rntl_lock unconditionally.
>
> Maybe we can move the device_del out from under the rtnl_lock, but I
> have my doubts. Certainly the proc and sysctl bits (which have the
> same issue look more difficult.
>
> - We almost have an issue in ext4.
> Device_del is certainly called under lock_kernel() and lock_super().
>
> - We have what a cpu_hotplug.lock issue with
> /sys/devices/system/cpu/cpuN/microcode/reload, a variant of the problem
> that triggered this discussion and it looks very non-trivial to solve.
>
> So I'm not certain what to say except that we have longstanding problems.

It's interesting that the above cases arn't common drivers. AFAICS,
the problem cases would usually be cases like above where the user is
a rather complex software entity or drivers which implement some form
of self detaching via sysfs. For the former group, I agree that
splitting deleting and draining (or simply skipping the draining part
or active reference counting both of which basically achieve the same
thing) would be an easy way out as it would be generally easy to leave
the data structures dangling till the references go away.

How about simply introducing an interface to mark sysfs nodes which
don't require active reference counting and using them on those nodes?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/