Re: fs: NULL deref in atime_needs_update

From: Al Viro
Date: Sat Feb 20 2016 - 12:10:56 EST


On Sat, Feb 20, 2016 at 02:25:40PM +0100, Mickaël Salaün wrote:

> I think the bug may be somewhere in the nd->depth handling (when its value is 0) in fs/namei.c:get_link(): struct saved *last = nd->stack + nd->depth - 1

Getting there with nd->depth == 0 would certainly be a bug - it would mean
that we got there without should_follow_link() having returned 1.

In case of open() it would be "do_last() has returned positive without
should_follow_link() having returned 1".

<looks>

OK, there are several places where we rely on not getting bogus return values
- inode_permission() should not return positives, neither should vfs_open(),
security_path_truncate() and notify_change().

Other similar "handle the last component" functions are guaranteed to
never return positives other than directly from should_follow_link(), so
they are OK.

IIRC, you used LSM to inject a positive value to inode_permission(), right?

Another way to trigger that would've been ->open() returning positive -
a bug on *anything* since ->open() had been introduced in 0.95. Amount of
harm would vary - e.g. 0.95 would simply have that positive number returned
to userland, looking like successful open(2). With no new descriptor, of
course...

Short-term we probably want just
if (unlikely(error > 0)) {
WARN_ON(1);
error = -EINVAL;
}
added right after out: in do_last(), try to trigger Dmitry's reproducers
on it and then work back to the source of that thing *if* that's what's
happening in his case. Yours almost certainly is just that.

Longer-term... I'm not sure. Having a method that is supposed to return 0
or -E<something> actually return positive is going to be a bad thing, no
matter what, but "that bogus value gets passed to userland" is a lot
more tolerable than "kernel memory corruption". do_last() calling conventions
make it vulnerable to the latter, and as far as nd->stack underruns that's
it, but I'm not sure we don't have other places where such bug in driver,
etc. would translate into mess ;-/

OK, in any case, let's start with checking if Dmitry is seeing that and not
something else. I still don't understand his stack traces - the fault
address quoted in his first posting doesn't match the register values in
the same trace, and there's also a possibility that it's an RCU-related
crap. This should give a warning and prevent an oops if we are hitting
a stack underrun on bogus positive from do_last(). Dmitry, could you try
to build with delta below and run your reproducer(s)?

diff --git a/fs/namei.c b/fs/namei.c
index f624d13..e30deef 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3273,6 +3273,10 @@ opened:
goto exit_fput;
}
out:
+ if (unlikely(error > 0)) {
+ WARN_ON(1);
+ error = -EINVAL;
+ }
if (got_write)
mnt_drop_write(nd->path.mnt);
path_put(&save_parent);