Re: Dcache oops

From: Linus Torvalds
Date: Fri Jun 03 2016 - 17:18:22 EST


On Fri, Jun 3, 2016 at 1:07 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> Aha... It's load_unaligned_zeropad() from dentry_string_cmp(), hitting
> a genuinely unmapped address. That sends it into fixup, where it tries to
> load an aligned word containing the address in question, in hope that
> fault was on attempt to cross into the next page. No such luck, address
> was aligned in the first place (it's in %rdx - 0xffff880113f82000), so
> we still oops.

Hmm. We do end up comparing the string length racily with the name
address, so I suspect there could be a real bug in dentry_cmp: the
fact that the name pointer and length aren't necessarily atomic wrt
each other means that we could overrun the dentry pointer to the next
page, even though it's aligned.

So maybe we need to do a careful load of the aligned dentry string
value too, not just the possibly unaligned qstr name. It's a *very*
unlikely race to hit, though. You'd have to shrink the name in the
dentry, get the old (longer name length) to match the one you look up,
and when have the memory unmapped at the end of the new (short)
length.

However, this is not that theoretical race case for two reasons:

(1) this happens in __d_lookup(), which has taken the dentry lock. So
it's not the racy unlocked RCU case to begin with.

(2) as you point out, since it is the load_unaligned_zeropad() case,
it isn't the possibly racy dentry name at all, but trhe qstr we're
comparing against (which may have the unaligned case, but not the
confusion about length).

> The unexpected part is that unmapped address did *NOT* come from a dentry;
> it's .name of qstr we were looking for. And your call chain was
> __d_lookup() <- d_lookup() <- lookup_open(), so in lookup_open() it was
> nd->last.name...

That should have no such issues. It should be a normal qstr, and the
length should be reliable.

So something must have corrupted the qstr.

The remaining length *should* in %edi, judging by the

0xffffffff81243b82 <+306>: cmp $0x7,%edi

in the __d_lookup() disassembly. And %rdi contains 2, so there were
supposed to be two more characters at 'ct' (which is %rdx).

Why would nd->last.name be bogus? I don't see anything.

Linus