Re: [BUG 4.9/4.10] crash in __d_lookup() due to corrupted dentry_hashtable

From: Heiko Carstens
Date: Mon Mar 20 2017 - 08:10:01 EST


On Fri, Mar 03, 2017 at 02:31:50PM +0100, Heiko Carstens wrote:
> Hello Al,
>
> Gustavo reported the crash below within __d_lookup() on s390. I'm wondering
> if you can make any sense of it:
>
> Unable to handle kernel pointer dereference in virtual kernel address space
> Failing address: fffffffffffff000 TEID: fffffffffffff803
> Fault in home space mode while using kernel ASCE.

...

> Kernel panic - not syncing: Fatal exception: panic_on_oops
>
> Looking at the relevant part of __d_lookup:
>
> struct dentry *__d_lookup(const struct dentry *parent, const struct qstr *name)
> {
> unsigned int hash = name->hash;
> struct hlist_bl_head *b = d_hash(hash); <--- points to corrupted entry
> struct hlist_bl_node *node;
> struct dentry *found = NULL;
> struct dentry *dentry;
>
> rcu_read_lock();
>
> hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
>
> if (dentry->d_name.hash != hash)
> continue;
> ...
>
> The contents of *b within the dump is:
>
> > struct hlist_bl_head 000003e0806248f8
> struct hlist_bl_head {
> first = 0xffffffffffffffff
> }
>
> Note that 0x000003e0806248f8 is a valid address within the
> dentry_hashtable. In addition all other entries look ok, as far as I can
> tell. This is the only entry that contains a -1UL value.
>
> We also have a second dump with a similar crash with a 4.9 kernel. In that
> case there are in total three entries spread within the dentry_hashtable
> with a -1UL value, while all other entries seem to look ok. So there seems
> to be a pattern.
>
> Note: these kernels do contain addon patches that are not mainline, but I
> don't believe that any of those can explain these corruptions.

Famous last words... it looks like it was indeed one of our addon patches.

At least with the bug fixed Gustavo reported that the system now survives
a 60h stress test, which it previously didn't.