Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek

From: Al Viro
Date: Thu Apr 04 2024 - 04:21:32 EST


On Thu, Apr 04, 2024 at 09:11:22AM +0100, Al Viro wrote:
> On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote:
> >
> > In the lockdep dependency chain, overlayfs inode lock is taken
> > before kernfs internal of->mutex, where kernfs (sysfs) is the lower
> > layer of overlayfs, which is sane.
> >
> > With /sys/power/resume (and probably other files), sysfs also
> > behaves as a stacking filesystem, calling vfs helpers, such as
> > lookup_bdev() -> kern_path(), which is a behavior of a stacked
> > filesystem, without all the precautions that comes with behaving
> > as a stacked filesystem.
>
> No. This is far worse than anything stacked filesystems do - it's
> an arbitrary pathname resolution while holding a lock.
> It's not local. Just about anything (including automounts, etc.)
> can be happening there and it pushes the lock in question outside
> of *ALL* pathwalk-related locks. Pathname doesn't have to
> resolve to anything on overlayfs - it can just go through
> a symlink on it, or walk into it and traverse a bunch of ..
> afterwards, etc.
>
> Don't confuse that with stacking - it's not even close.
> You can't use that anywhere near overlayfs layers.
>
> Maybe isolate it into a separate filesystem, to be automounted
> on /sys/power. And make anyone playing with overlayfs with
> sysfs as a layer mount the damn thing on top of power/ in your
> overlayfs. But using that thing as a part of layer is
> a non-starter.

Incidentally, why do you need to lock overlayfs inode to call
vfs_llseek() on the underlying file? It might (or might not)
need to lock the underlying file (for things like ->i_size,
etc.), but that will be done by ->llseek() instance and it
would deal with the inode in the layer, not overlayfs one.

Similar question applies to ovl_write_iter() - why do you
need to hold the overlayfs inode locked during the call of
backing_file_write_iter()?