Re: proc_flush_task oops

From: Linus Torvalds
Date: Mon Dec 18 2017 - 18:50:59 EST


On Mon, Dec 18, 2017 at 3:10 PM, Dave Jones <davej@xxxxxxxxxxxxxxxxx> wrote:
> On Mon, Dec 18, 2017 at 10:15:41PM +0000, Al Viro wrote:
> > On Mon, Dec 18, 2017 at 04:44:38PM -0500, Dave Jones wrote:
> > > I've hit this twice today. It's odd, because afaics, none of this code
> > > has really changed in a long time.
> >
> > Which tree had that been?
>
> Linus, rc4.

Ok, so the original report was marked as spam for me for whatever
reason. I ended up re-analyzing the oops, but came to the same
conclusion you did: it's a NULL mnt pointer in proc_flush_task_mnt().

The code disassembles to

0: c1 e2 04 shl $0x4,%edx
3: 44 8b 60 30 mov 0x30(%rax),%r12d
7: 48 8b 40 38 mov 0x38(%rax),%rax
b: 44 8b 34 11 mov (%rcx,%rdx,1),%r14d
f: 48 c7 c2 60 3a f5 81 mov $0xffffffff81f53a60,%rdx
16: 44 89 e1 mov %r12d,%ecx
19: 4c 8b 68 58 mov 0x58(%rax),%r13
1d: e8 4b b4 77 00 callq 0x77b46d
22: 89 44 24 14 mov %eax,0x14(%rsp)
26: 48 8d 74 24 10 lea 0x10(%rsp),%rsi
2b:* 49 8b 7d 00 mov 0x0(%r13),%rdi <-- trapping instruction
2f: e8 b9 6a f9 ff callq 0xfffffffffff96aed
34: 48 85 c0 test %rax,%rax
37: 74 1a je 0x53
39: 48 89 c7 mov %rax,%rdi

and just matching that up against the code I see generated, that first
call is the call to snprintf, and the second call is to
d_hash_and_lookup.

So it's one of these two patterns (pid vs tgid):

name.len = snprintf(buf, sizeof(buf), "%d", pid);
/* no ->d_hash() rejects on procfs */
dentry = d_hash_and_lookup(mnt->mnt_root, &name);

and that "mov 0x0(%r13),%rdi" that traps is "mnt->mnt_root".

But I don't see what would have changed in this area recently.

Do you end up saving the seeds that cause crashes? Is this
reproducible? (Other than seeing it twoce, of course)

Linus