Re: EXT4 Oops (Re: [PATCH V15 06/22] mmc: block: Add blk-mq support)

From: Dmitry Osipenko
Date: Mon Mar 05 2018 - 19:48:40 EST


On 01.03.2018 19:04, Theodore Ts'o wrote:
> On Thu, Mar 01, 2018 at 10:55:37AM +0200, Adrian Hunter wrote:
>> On 27/02/18 11:28, Adrian Hunter wrote:
>>> On 26/02/18 23:48, Dmitry Osipenko wrote:
>>>> But still something is wrong... I've been getting occasional EXT4 Ooops's, like
>>>> the one below, and __wait_on_bit() is always figuring in the stacktrace. It
>>>> never happened with blk-mq disabled, though it could be a coincidence and
>>>> actually unrelated to blk-mq patches.
>>>
>>>> [ 6625.992337] Unable to handle kernel NULL pointer dereference at virtual
>>>> address 0000001c
>>>> [ 6625.993004] pgd = 00b30c03
>>>> [ 6625.993257] [0000001c] *pgd=00000000
>>>> [ 6625.993594] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
>>>> [ 6625.994022] Modules linked in:
>>>> [ 6625.994326] CPU: 1 PID: 19355 Comm: dpkg Not tainted
>>>> 4.16.0-rc2-next-20180220-00095-ge9c9f5689a84-dirty #2090
>>>> [ 6625.995078] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
>>>> [ 6625.995595] PC is aht dx_probe+0x68/0x684
>>>> [ 6625.995947] LR is at __wait_on_bit+0xac/0xc8
>
> This doesn't seem to make sense; the PC is where we are currently
> executing, and LR is the "Link Register" where the flow of control
> will be returning after the current function returns, right? Well,
> dx_probe should *not* be returning to __wait_on_bit(). So this just
> seems.... weird.
>
> Ignoring the LR register, this stack trace looks sane... I can't see
> which pointer could be NULL and getting dereferenced, though. How
> easily can you reproduce the problem? Can you either (a) translate
> the PC into a line number, or better yet, if you can reproduce, add a
> series of BUG_ON's so we can see what's going on?
>
> + BUG_ON(frame);
> memset(frame_in, 0, EXT4_HTREE_LEVEL * sizeof(frame_in[0]));
> frame->bh = ext4_read_dirblock(dir, 0, INDEX);
> if (IS_ERR(frame->bh))
> return (struct dx_frame *) frame->bh;
>
> + BUG_ON(frame->bh);
> + BUG_ON(frame->bh->b_data);
> root = (struct dx_root *) frame->bh->b_data;
> if (root->info.hash_version != DX_HASH_TEA &&
> root->info.hash_version != DX_HASH_HALF_MD4 &&
> root->info.hash_version != DX_HASH_LEGACY) {
>
> These are "could never" happen scenarios from looking at the code, but
> that will help explain what is going on.
>
> If this is reliably only happening with mq, the only way I could see
> that if is something is returning an error when it previously wasn't.
> This isn't a problem we're seeing with any of our testing, though.

It happened today again, "BUG_ON(!frame->bh->b_data);" has been trapped.

kernel BUG at fs/ext4/namei.c:751!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 296 Comm: cron Not tainted
4.16.0-rc2-next-20180220-00095-ge9c9f5689a84-dirty #2100
Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
PC is at dx_probe+0x308/0x694
LR is at __wait_on_bit+0xac/0xc8
pc : [<c033bc00>] lr : [<c0bfbff4>] psr: 60040013
sp : d545bc20 ip : c0170e88 fp : d545bc74
r10: 00000000 r9 : d545bca0 r8 : d4209300
r7 : 00000000 r6 : 00000000 r5 : d656e838 r4 : d545bcbc
r3 : 0000007b r2 : d5830800 r1 : d5831000 r0 : d4209300
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 1552004a DAC: 00000051
Process cron (pid: 296, stack limit = 0x4d1ebf14)
Stack: (0xd545bc20 to 0xd545c000)
bc20: 000002ea c0c019d4 60040113 014000c0 c029e640 d6cf3540 d545bc7c d545bc48
bc40: c02797f4 c0152804 d545bca4 00000007 d5830800 00000000 d656e838 00000001
bc60: d545bca0 00000000 d545bd0c d545bc78 c033d578 c033b904 c029e714 c029b088
bc80: 00000148 c0c01984 d65f6be0 00000000 d545be10 d545bd24 d545bd00 d5830800
bca0: d65f6bf8 d65f6c0c 00000007 d6547720 8420edbe c029eec8 00000000 d4209300
bcc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bce0: d545bd48 d65f6be0 d656e838 d65f6be0 d6547720 00000001 d545be10 00000000
bd00: d545bd44 d545bd10 c033d7c0 c033d1c8 d545bd34 d656e8b8 d656e838 d545be08
bd20: d656e838 00000000 d65f6be0 d656e838 d656e8b8 d6547720 d545bd8c d545bd48
bd40: c028ea50 c033d774 00000000 dead4ead ffffffff ffffffff d545bd58 d545bd58
bd60: d6d7f015 d545be08 00000000 00000000 d545bee8 d545bee8 d545bf28 00000000
bd80: d545bdd4 d545bd90 c028f310 c028e9b0 d545be08 80808080 d545be08 d6d7f010
bda0: d545bdd4 d545bdb0 c028df9c d545be08 d6d7f010 00000000 d545bee8 d545bee8
bdc0: d545bf28 00000000 d545be04 d545bdd8 c0290e24 c028f160 c0111a1c c0111674
bde0: d545be04 d545bdf0 00000001 d6d7f000 d545be08 00000001 d545beb4 d545be08
be00: c0293848 c0290da4 d6dd0310 d6547720 8420edbe 00000007 d6d7f015 0000000c
be20: d6dd0310 d6547098 d656e838 00000001 00000002 00000fe0 00000000 00000000
be40: 00000000 d545be48 c02797f4 00000ff0 d6d7f010 c102b4c8 d5522db8 d6d7f000
be60: c130bbdc 004f73f8 00000000 00000001 d545bf28 00000000 d6d7f000 00000000
be80: c0293570 00000002 ffffff9c 00000001 ffffff9c 00000001 ffffff9c d545bee8
bea0: ffffff9c 004f73f8 d545bedc d545beb8 c0293990 c02937b4 00000000 00000000
bec0: 00000000 beb93970 00000001 00000800 d545bf1c d545bee0 c028859c c0293948
bee0: 00000000 d545bfb0 00509070 00508d7c d545bfac beb93970 00000003 beb95cd0
bf00: 000000c3 c01011e4 d545a000 00000000 d545bfa4 d545bf20 c0288df4 c0288540
bf20: 000007ff c0152868 00000fff 000043d8 00000002 00001000 00000000 00000000
bf40: 00000874 00000000 0006037f 00000000 0b300031 00000000 00000000 0000006d
bf60: 00001000 00000000 5a9c7e8b 2d4cae00 5a0d222f 00000000 5a8c9273 22358b29
bf80: 5a8c8591 301168da 00000008 b6ea94fc 00030030 b6f91ab8 00000000 d545bfa8
bfa0: c0101000 c0288dc8 b6f91ab8 00000003 004f73f8 beb93970 beb93a90 3dc50800
bfc0: b6f91ab8 00000003 beb95cd0 000000c3 00509cec 00509070 00508d7c 00000002
bfe0: 000000c3 beb93968 b6ea354b b6e2ccf6 20030030 004f73f8 17bfd861 17bfdc61
[<c033bc00>] (dx_probe) from [<c033d578>] (ext4_find_entry+0x3bc/0x5ac)
[<c033d578>] (ext4_find_entry) from [<c033d7c0>] (ext4_lookup+0x58/0x1f4)
[<c033d7c0>] (ext4_lookup) from [<c028ea50>] (lookup_slow+0xac/0x15c)
[<c028ea50>] (lookup_slow) from [<c028f310>] (walk_component+0x1bc/0x2f0)
[<c028f310>] (walk_component) from [<c0290e24>] (path_lookupat+0x8c/0x1f0)
[<c0290e24>] (path_lookupat) from [<c0293848>] (filename_lookup+0xa0/0xfc)
[<c0293848>] (filename_lookup) from [<c0293990>] (user_path_at_empty+0x54/0x5c)
[<c0293990>] (user_path_at_empty) from [<c028859c>] (vfs_statx+0x68/0xc4)
[<c028859c>] (vfs_statx) from [<c0288df4>] (SyS_stat64+0x38/0x54)
[<c0288df4>] (SyS_stat64) from [<c0101000>] (ret_fast_syscall+0x0/0x54)
Exception stack(0xd545bfa8 to 0xd545bff0)
bfa0: b6f91ab8 00000003 004f73f8 beb93970 beb93a90 3dc50800
bfc0: b6f91ab8 00000003 beb95cd0 000000c3 00509cec 00509070 00508d7c 00000002
bfe0: 000000c3 beb93968 b6ea354b b6e2ccf6
Code: e2833094 e587300c eaffff72 e7f001f2 (e7f001f2)
---[ end trace 60fa8eaa4e57e458 ]---