Re: NULL pointer dereference in autofs4_expire_wait

From: Ian Kent
Date: Fri Oct 11 2013 - 21:56:51 EST


On Fri, 2013-10-11 at 07:29 -0600, David Ahern wrote:
> On 10/11/13 3:55 AM, Ian Kent wrote:
> > On Fri, 2013-10-11 at 10:06 +0800, Ian Kent wrote:
> >> On Thu, 2013-10-10 at 17:22 -0600, David Ahern wrote:
> >>> Running 3.12-rc3 just hit BUG in autofs4_expire_wait
> >>
> >> It doesn't look like this could be due to Al's change to the locking in
> >> autos4_wait() and that the only change to autofs that I'm aware of.
> >>
> >> Could you do a bisect please?
> >
> > Of course that assumes it's repeatable.
> > Is it?
> >
> > Can you provide any information about the environment and activity that
> > was happening at the time of the BUG()?
>
> The system was up and running for 9 days before hitting the BUG. After
> that with 3 cpus on softlockup I had to do a reboot (forced). After the
> reboot I continued the workload again without a repeat incident (yet),
> so I am not sure bisect is going to be possible.

Yeah, it isn't repeatable.

>
> This is a corporate environment where practically everything is in an
> automount. Specific to this problem I was repeatedly building a
> workspace in one window, using cscope in another and checking code
> against a different workspace in a third -- all 3 of those were
> different automounts and different NAS servers.
>
> From objdump on vmlinux the line in question is fs/autofs4/expire.c:465
>
> if (ino->flags & AUTOFS_INF_EXPIRING) {

Right, there haven't been changes to the autofs kernel code that affect
the reference counting of dentrys so I have to conclude this is being
caused by other changes.

When walking an autofs path, the walk should always be put into refwalk
mode, so the function containing this line should always have a dentry
with a reference held. Which just means that the autofs info struct (ino
here) won't be invalid.

Now ->d_release() (which frees ino) is only called after the dentry
reference count falls to zero and the dentry is going away.

We can't check ino for NULL here because the dentry pointer to it isn't
set to NULL when it's freed in ->d_release(). Setting the dentry field
to NULL is futile because the next thing the VFS does is to free the
dentry itself. Well, it calls RCU to schedule the free anyway.

The fact that ->d_release() has been called makes me think there's a
reference counting problem somewhere in the VFS.

Al, is my thinking correct here?

There were some significant changes to this area of the VFS in 3.11 by
the look of it.

So more history please, had you used 3.11 for an extended amount of
time, before using the 3.12-rc? IOW what's your kernel version use
history please?

Ian


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/