Re: fs: NULL deref in atime_needs_update
From: Dmitry Vyukov
Date: Mon Feb 22 2016 - 06:20:57 EST
On Sat, Feb 20, 2016 at 6:10 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Sat, Feb 20, 2016 at 02:25:40PM +0100, MickaÃl SalaÃn wrote:
>
>> I think the bug may be somewhere in the nd->depth handling (when its value is 0) in fs/namei.c:get_link(): struct saved *last = nd->stack + nd->depth - 1
>
> Getting there with nd->depth == 0 would certainly be a bug - it would mean
> that we got there without should_follow_link() having returned 1.
>
> In case of open() it would be "do_last() has returned positive without
> should_follow_link() having returned 1".
>
> <looks>
>
> OK, there are several places where we rely on not getting bogus return values
> - inode_permission() should not return positives, neither should vfs_open(),
> security_path_truncate() and notify_change().
>
> Other similar "handle the last component" functions are guaranteed to
> never return positives other than directly from should_follow_link(), so
> they are OK.
>
> IIRC, you used LSM to inject a positive value to inode_permission(), right?
>
> Another way to trigger that would've been ->open() returning positive -
> a bug on *anything* since ->open() had been introduced in 0.95. Amount of
> harm would vary - e.g. 0.95 would simply have that positive number returned
> to userland, looking like successful open(2). With no new descriptor, of
> course...
>
> Short-term we probably want just
> if (unlikely(error > 0)) {
> WARN_ON(1);
> error = -EINVAL;
> }
> added right after out: in do_last(), try to trigger Dmitry's reproducers
> on it and then work back to the source of that thing *if* that's what's
> happening in his case. Yours almost certainly is just that.
>
> Longer-term... I'm not sure. Having a method that is supposed to return 0
> or -E<something> actually return positive is going to be a bad thing, no
> matter what, but "that bogus value gets passed to userland" is a lot
> more tolerable than "kernel memory corruption". do_last() calling conventions
> make it vulnerable to the latter, and as far as nd->stack underruns that's
> it, but I'm not sure we don't have other places where such bug in driver,
> etc. would translate into mess ;-/
>
> OK, in any case, let's start with checking if Dmitry is seeing that and not
> something else. I still don't understand his stack traces - the fault
> address quoted in his first posting doesn't match the register values in
> the same trace, and there's also a possibility that it's an RCU-related
> crap. This should give a warning and prevent an oops if we are hitting
> a stack underrun on bogus positive from do_last(). Dmitry, could you try
> to build with delta below and run your reproducer(s)?
>
> diff --git a/fs/namei.c b/fs/namei.c
> index f624d13..e30deef 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3273,6 +3273,10 @@ opened:
> goto exit_fput;
> }
> out:
> + if (unlikely(error > 0)) {
> + WARN_ON(1);
> + error = -EINVAL;
> + }
> if (got_write)
> mnt_drop_write(nd->path.mnt);
> path_put(&save_parent);
I've reproduced the second report (the one originating in openat) with
this patch and the WARNING did _not_ fire:
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Modules linked in:
CPU: 2 PID: 17525 Comm: syz-executor Not tainted 4.5.0-rc5+ #331
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88002c6ddf00 ti: ffff88002c740000 task.ti: ffff88002c740000
RIP: 0010:[<ffffffff81821ded>] [<ffffffff81821ded>]
atime_needs_update+0x2d/0x460
RSP: 0018:ffff88002c747a48 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff88002c747d88
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000c
RBP: ffff88002c747a70 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88002c747d98
R13: 0000000000000000 R14: ffff88002c747d98 R15: ffff88002c747d78
FS: 00007f24da3d9700(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000002003f000 CR3: 000000002e2d0000 CR4: 00000000000006e0
Stack:
ffff88002c747d40 ffff88002c747e08 0000000000000000 ffff88002c747d98
ffff88002c747d78 ffff88002c747ab8 ffffffff817eeda2 ffff88002c747d78
ffff880030fa88e8 ffff88002c747c98 0000000000000000 ffff88002c747d40
Call Trace:
[< inline >] get_link fs/namei.c:1006
[<ffffffff817eeda2>] trailing_symlink+0x142/0x760 fs/namei.c:2094
[<ffffffff817f5cec>] path_openat+0xb4c/0x5760 fs/namei.c:3393
[<ffffffff817fe13e>] do_filp_open+0x18e/0x250 fs/namei.c:3425
[<ffffffff817c2dbc>] do_sys_open+0x1fc/0x420 fs/open.c:1022
[< inline >] SYSC_openat fs/open.c:1049
[<ffffffff817c3050>] SyS_openat+0x30/0x40 fs/open.c:1043
[<ffffffff8669f6b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
Code: 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 89 f3 e8 98 17 d5
ff 48 8d 7b 0c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f>
b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
RIP [<ffffffff81821ded>] atime_needs_update+0x2d/0x460 fs/inode.c:1617
RSP <ffff88002c747a48>
---[ end trace 872348222bfe81b0 ]---
This is _not_ on a tmpfs mount.
Regarding the registers, here is disassembly. The crash happens on a
KASAN check of the
ffffffff81821dc0 <atime_needs_update>:
ffffffff81821dc0: 55 push %rbp
ffffffff81821dc1: 48 89 e5 mov %rsp,%rbp
ffffffff81821dc4: 41 57 push %r15
ffffffff81821dc6: 41 56 push %r14
ffffffff81821dc8: 41 55 push %r13
ffffffff81821dca: 41 54 push %r12
ffffffff81821dcc: 49 89 fc mov %rdi,%r12
ffffffff81821dcf: 53 push %rbx
ffffffff81821dd0: 48 89 f3 mov %rsi,%rbx
ffffffff81821dd3: e8 98 17 d5 ff callq
ffffffff81573570 <__sanitizer_cov_trace_pc>
ffffffff81821dd8: 48 8d 7b 0c lea 0xc(%rbx),%rdi
ffffffff81821ddc: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
ffffffff81821de3: fc ff df
ffffffff81821de6: 48 89 fa mov %rdi,%rdx
ffffffff81821de9: 48 c1 ea 03 shr $0x3,%rdx
ffffffff81821ded: 0f b6 14 02 movzbl (%rdx,%rax,1),%edx
ffffffff81821df1: 48 89 f8 mov %rdi,%rax
ffffffff81821df4: 83 e0 07 and $0x7,%eax
ffffffff81821df7: 83 c0 03 add $0x3,%eax
ffffffff81821dfa: 38 d0 cmp %dl,%al
ffffffff81821dfc: 7c 08 jl
ffffffff81821e06 <atime_needs_update+0x46>
ffffffff81821dfe: 84 d2 test %dl,%dl
ffffffff81821e00: 0f 85 03 03 00 00 jne
ffffffff81822109 <atime_needs_update+0x349>
ffffffff81821e06: f6 43 0c 02 testb $0x2,0xc(%rbx)
ffffffff81821e0a: 0f 85 1a 02 00 00 jne
ffffffff8182202a <atime_needs_update+0x26a>
ffffffff81821e10: e8 5b 17 d5 ff callq
ffffffff81573570 <__sanitizer_cov_trace_pc>
ffffffff81821e15: 48 8d 7b 28 lea 0x28(%rbx),%rdi
ffffffff81821e19: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
ffffffff81821e20: fc ff df
ffffffff81821e23: 48 89 fa mov %rdi,%rdx
ffffffff81821e26: 48 c1 ea 03 shr $0x3,%rdx
ffffffff81821e2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
ffffffff81821e2e: 0f 85 c4 03 00 00 jne
ffffffff818221f8 <atime_needs_update+0x438>
It means that inode is NULL here:
bool atime_needs_update(const struct path *path, struct inode *inode)
{
struct vfsmount *mnt = path->mnt;
struct timespec now;
if (inode->i_flags & S_NOATIME)
return false;