Re: dcache_readdir NULL inode oops
From: Eric W. Biederman
Date: Fri Nov 30 2018 - 10:17:00 EST
"gregkh@xxxxxxxxxxxxxxxxxxx" <gregkh@xxxxxxxxxxxxxxxxxxx> writes:
> Adding Eric as he touched this code last :)
>
> On Thu, Nov 29, 2018 at 07:25:48PM +0000, Jan Glauber wrote:
>> On Wed, Nov 28, 2018 at 08:08:06PM +0000, Will Deacon wrote:
>> > I spent some more time looking at this today...
>> >
>> > On Fri, Nov 23, 2018 at 06:05:25PM +0000, Will Deacon wrote:
>> > > Doing some more debugging, it looks like the usual failure case is where
>> > > one CPU clears the inode field in the dentry via:
>> > >
>> > > devpts_pty_kill()
>> > > -> d_delete() // dentry->d_lockref.count == 1
>> > > -> dentry_unlink_inode()
>> > >
>> > > whilst another CPU gets a pointer to the dentry via:
>> > >
>> > > sys_getdents64()
>> > > -> iterate_dir()
>> > > -> dcache_readdir()
>> > > -> next_positive()
>> > >
>> > > and explodes on the subsequent inode dereference when trying to pass the
>> > > inode number to dir_emit():
>> > >
>> > > if (!dir_emit(..., d_inode(next)->i_ino, ...))
>> > >
>> > > Indeed, the hack below triggers a warning, indicating that the inode
>> > > is being cleared concurrently.
>> > >
>> > > I can't work out whether the getdents64() path should hold a refcount
>> > > to stop d_delete() in its tracks, or whether devpts_pty_kill() shouldn't
>> > > be calling d_delete() like this at all.
>> >
>> > So the issue is that opening /dev/pts/ptmx creates a new pty in /dev/pts,
>> > which disappears when you close /dev/pts/ptmx. Consequently, when we tear
>> > down the dentry for the magic new file, we have to take the i_node rwsem of
>> > the *parent* so that concurrent path walkers don't trip over it whilst its
>> > being freed. I wrote a simple concurrent program to getdents(/dev/pts/) in
>> > one thread, whilst another opens and closes /dev/pts/ptmx: it crashes the
>> > kernel in seconds.
>>
>> I also made a testcase and verified that your fix is fine. I also tried
>> replacing open-close on /dev/ptmx with mkdir-rmdir but that does not
>> trigger the error.
>>
>> > Patch below, but I'd still like somebody else to look at this, please.
>>
>> I wonder why no inode_lock on parent is needed for devpts_pty_new(), but
>> I'm obviously not a VFS expert... So your patch looks good to me and
>> clearly solves the issue.
>>
>> thanks,
>> Jan
>>
>> > Will
>> >
>> > --->8
>> >
>> > diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
>> > index c53814539070..50ddb95ff84c 100644
>> > --- a/fs/devpts/inode.c
>> > +++ b/fs/devpts/inode.c
>> > @@ -619,11 +619,17 @@ void *devpts_get_priv(struct dentry *dentry)
>> > */
>> > void devpts_pty_kill(struct dentry *dentry)
>> > {
>> > - WARN_ON_ONCE(dentry->d_sb->s_magic != DEVPTS_SUPER_MAGIC);
>> > + struct super_block *sb = dentry->d_sb;
>> > + struct dentry *parent = sb->s_root;
>> >
>> > + WARN_ON_ONCE(sb->s_magic != DEVPTS_SUPER_MAGIC);
>
> Side note, I wonder if this is even needed anymore...
>
>> > +
>> > + inode_lock(parent->d_inode);
>> > dentry->d_fsdata = NULL;
>> > drop_nlink(dentry->d_inode);
>> > d_delete(dentry);
>> > + inode_unlock(parent->d_inode);
>> > +
>> > dput(dentry); /* d_alloc_name() in devpts_pty_new() */
>> > }
>
> This feels right but getting some feedback from others would be good.
This is going to be special at least because we are not coming through
the normal unlink path and we are manipulating the dcache.
This looks plausible. If this is whats going on then we have had this
bug for a very long time. I will see if I can make some time.
It looks like in the general case everything is serialized by the
devpts_mutex. I wonder if just changing the order of operations
here would be enough.
AKA: drop_nlink d_delete then dentry->d_fsdata. Ugh d_fsdata is not
implicated so that won't help here.
I will look more shortly. I am in the middle in the move so I don't
have time to complete the analysis today.
Eric