Re: [PATCH 00/17] VFS: Filesystem information and notifications [ver #17]

From: Al Viro
Date: Fri Mar 06 2020 - 15:37:17 EST


On Fri, Mar 06, 2020 at 08:05:22PM +0000, Al Viro wrote:
> On Fri, Mar 06, 2020 at 07:58:23PM +0000, Al Viro wrote:
> > On Fri, Mar 06, 2020 at 07:43:22PM +0000, Al Viro wrote:
> > > On Fri, Mar 06, 2020 at 05:25:49PM +0100, Miklos Szeredi wrote:
> > > > On Tue, Mar 03, 2020 at 08:46:09AM +0100, Miklos Szeredi wrote:
> > > > >
> > > > > I'm doing a patch. Let's see how it fares in the face of all these
> > > > > preconceptions.
> > > >
> > > > Here's a first cut. Doesn't yet have superblock info, just mount info.
> > > > Probably has rough edges, but appears to work.
> > >
> > > For starters, you have just made namespace_sem held over copy_to_user().
> > > This is not going to fly.
> >
> > In case if the above is too terse: you grab your mutex while under
> > namespace_sem (see attach_recursive_mnt()); the same mutex is held
> > while calling dir_emit(). Which can (and normally does) copy data
> > to userland-supplied buffer.
> >
> > NAK for that reason alone, and to be honest I had been too busy
> > suppressing the gag reflex to read and comment any deeper.
> >
> > I really hate that approach, in case it's not clear from the above.
> > To the degree that I don't trust myself to filter out the obscenities
> > if I try to comment on it right now.
> >
> > The only blocking thing we can afford under namespace_sem is GFP_KERNEL
> > allocation.
>
> Incidentally, attach_recursive_mnt() only gets you the root(s) of
> attached tree(s); try mount --rbind and see how much you've missed.

You are misreading mntput_no_expire(), BTW - your get_mount() can
bloody well race with umount(2), hitting the moment when we are done
figuring out whether it's busy but hadn't cleaned ->mnt_ns (let alone
set MNT_DOOMED) yet. If somebody calls umount(2) on a filesystem that
is not mounted anywhere else, they are not supposed to see the sucker
return 0 until the filesystem is shut down. You break that.