Re: dcache_readdir NULL inode oops

From: Will Deacon
Date: Fri Nov 30 2018 - 11:32:12 EST


On Fri, Nov 30, 2018 at 04:08:52PM +0000, Al Viro wrote:
> On Fri, Nov 30, 2018 at 09:16:49AM -0600, Eric W. Biederman wrote:
> > >> > + inode_lock(parent->d_inode);
> > >> > dentry->d_fsdata = NULL;
> > >> > drop_nlink(dentry->d_inode);
> > >> > d_delete(dentry);
> > >> > + inode_unlock(parent->d_inode);
> > >> > +
> > >> > dput(dentry); /* d_alloc_name() in devpts_pty_new() */
> > >> > }
> > >
> > > This feels right but getting some feedback from others would be good.
> >
> > This is going to be special at least because we are not coming through
> > the normal unlink path and we are manipulating the dcache.
> >
> > This looks plausible. If this is whats going on then we have had this
> > bug for a very long time. I will see if I can make some time.
> >
> > It looks like in the general case everything is serialized by the
> > devpts_mutex. I wonder if just changing the order of operations
> > here would be enough.
> >
> > AKA: drop_nlink d_delete then dentry->d_fsdata. Ugh d_fsdata is not
> > implicated so that won't help here.
>
> It certainly won't. The thing is, this
> if (!dir_emit(ctx, next->d_name.name, next->d_name.len,
> d_inode(next)->i_ino, dt_type(d_inode(next))))
> in dcache_readdir() obviously can block, so all we can hold over it is
> blocking locks. Which we do - specifically, ->i_rwsem on our directory.
>
> It's actually worse than missing inode_lock() - consider the effects
> of mount --bind /mnt/foo /dev/pts/42. What happens when that thing
> goes away? Right, a lost mount...

Ha, I hadn't even considered that scenario. Urgh!

> I'll resurrect the "kernel-internal rm -rf done right" series and
> post it; devpts is not the only place suffering such problem (binfmt_misc,
> etc.)

Thanks. I'm happy to test that it solves this issue if you throw me on cc.

Will