Re: [Bluez-devel] Oops involving RFCOMM and sysfs

From: Tejun Heo
Date: Sat Jan 05 2008 - 21:08:27 EST


Hello,

Al Viro wrote:
> On Sat, Jan 05, 2008 at 11:30:25PM +0900, Tejun Heo wrote:
>>> Assuming that this is what we get, everything looks explainable - we
>>> have sysfs_rename_dir() calling sysfs_get_dentry() while the parent
>>> gets evicted. We don't have any exclusion, so while we are playing
>>> silly buggers with lookups in sysfs_get_dentry() we have parent become
>>> negative; the rest is obvious...
>> That part of code is walking down the sysfs tree from the s_root of
>> sysfs hierarchy and on each step parent is held using dget() while being
>> referenced, so I don't think they can turn negative there.
>
> Turn? Just what stops you from getting a negative (and unhashed) from
> lookup_one_noperm() and on the next iteration being buggered on mutex_lock()?

Right, I haven't thought about that. When sysfs_get_dentry() is called,
@sd is always valid so unless there was existing negative dentry, lookup
is guaranteed to return positive dentry, but by populating dcache with
negative dentry before a node is created, things can go wrong. I don't
think that's what's going on here tho. If that was the case, the
while() loop looking up the next sd to lookup (@cur) should have blown
up as negative dentry will have NULL d_fsdata which doesn't match any sd.

I guess what's needed here is d_revalidate() as other distributed
filesystems do. I'll test whether this can be actually triggered and
prepare a fix. Thanks a lot for pointing out the problem.

>>> AFAICS, the locking here is quite broken and frankly, sysfs_get_dentry()
>>> and the way it plays with fs/namei.c are ucking fugly.
>> Can you elaborate a bit? The locking in sysfs is unconventional but
>> that's mostly from necessity. It has dual interface - vfs and driver
>> model && vfs data structures (dentry and inode) are too big to always
>> keep around, so it basically becomes a small distributed file system
>> where the backing data can change asynchronously.
>
> ... with all fun that creates. As it is, you have those async changers
> of backing data using VFS locking _under_ sysfs locks via lookup_one_noperm()
> and yet it needs sysfs_mutex inside sysfs_lookup(). So you can't have
> sysfs_get_dentry() under it. So you don't have exclusion with arseloads
> of sysfs tree changes in there. Joy...

There are two locks. sysfs_rename_mutex and sysfs_mutex.
sysfs_rename_mutex is above VFS locks while sysfs_mutex is below VFS
locks. sysfs_rename_mutex() protects against move/rename which can
change the ancestry of a held sysfs_dirent while sysfs_mutex protects
the sd hierarchy itself. Locking can be wrong if sysfs_rename_mutex
locking is missing from the places where ancestry of a held sd can
change but I can't find one ATM. If I'm missing your point again, feel
free to scream at me. :-)

As it's unnecessarily unintuitive, there's a pending change to rename
sysfs_rename_mutex and use it to protect the whole tree structure to
make locking simpler while using sysfs_mutex to guard VFS access such
that the locking hierarchy plainly becomes sysfs_rename_mutex - VFS
locks - sysfs_mutex where all internal sysfs structure is protected by
the outer mutex and the inner one just protects VFS accesses.

> Frankly, with the current state of sysfs the last vestiges of arguments
> used to push it into the tree back then are dead and buried. I'm not
> blaming you, BTW - the shitpile *did* grow past the point where its
> memory footprint became far too large and something needed to be done.
> Unfortunately, it happened too late for that something being "get rid
> of the entire mess" and now we are saddled with it for good.

Yeah, it's too late to get rid of sysfs and regardless implementation
ugliness, which BTW I think has improved a lot during last six or so
months, it's now pretty useful and important to drivers, so I guess the
only option is trying hard to make it better.

Oh, BTW, the ugly lookup_one_noperm() can be removed if LOOKUP_NOPERM
flag is added. The only reason sysfs_lookup() uses the specialized
lookup is to avoid permission check.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/