Re: Linux 4.8-rc1 Cirrus QEMU crashes on boot.

From: Linus Torvalds
Date: Mon Aug 08 2016 - 17:19:29 EST


On Mon, Aug 8, 2016 at 1:24 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>
> Booting up v4.8-rc1 in qemu today I ran I ran into this beautiful oops.
>
> I am just about to head out the door on vacation until the end of the
> month so hopefully I am tossing out enough information to the right
> people.
>
> I have confirmed that disabling CONFIG_DRM_CIRRUS_QEMU avoids this
> crash.
>
> [ 0.720167] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
> [ 0.721348] IP: [<ffffffff8d5ef9ac>] drm_pick_crtcs+0xfc/0x270
> [ 0.722249] PGD 0
> [ 0.722659] Oops: 0000 [#1] SMP
> [ 0.723141] Modules linked in:
> [ 0.723643] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc1+ #116
> [ 0.724602] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [ 0.725450] task: ffff8800bbf28000 task.stack: ffff8800bbf30000
> [ 0.726327] RIP: 0010:[<ffffffff8d5ef9ac>] [<ffffffff8d5ef9ac>] drm_pick_crtcs+0xfc/0x270
> [ 0.727648] RSP: 0000:ffff8800bbf33978 EFLAGS: 00010217
> [ 0.728444] RAX: ffffffff8d623a20 RBX: 0000000000000000 RCX: ffff8800baf3f860
> [ 0.729498] RDX: 0000000000000000 RSI: 0000000000001000 RDI: ffff8800baf3f800
> [ 0.730626] RBP: ffff8800bbf339f8 R08: 0000000000000000 R09: 0000000000000001
> [ 0.731704] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8800bbaf3af0
> [ 0.732827] R13: ffff8800bbaf3af8 R14: 0000000000000000 R15: ffff8800baf3fc00
> [ 0.733863] FS: 0000000000000000(0000) GS:ffff8800bf800000(0000) knlGS:0000000000000000
> [ 0.735106] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.735964] CR2: 0000000000000018 CR3: 000000003b219000 CR4: 00000000000006f0
> [ 0.737021] Stack:
> [ 0.737342] ffff8800bbf33998 0000000000000246 ffff8800bbaf3b18 ffff8800bbaf3b00
> [ 0.738602] 0000100000000001 0000000000000000 ffff8800bbaf3b00 ffff8800baf3f800
> [ 0.739807] 0000000300001000 ffff8800bbaf3b18 ffff8800bbf339e8 ffff8800baf3fc00
> [ 0.740993] Call Trace:
> [ 0.741393] [<ffffffff8d5efedd>] drm_setup_crtcs+0x3bd/0xb50
> [ 0.742252] [<ffffffff8d0edaa1>] ? mark_held_locks+0x71/0x90
> [ 0.743176] [<ffffffff8db0db6b>] ? __mutex_unlock_slowpath+0xeb/0x1c0
> [ 0.744138] [<ffffffff8d5f08c1>] drm_fb_helper_initial_config+0x81/0x3c0
> [ 0.745166] [<ffffffff8d6110b4>] ? drm_modeset_unlock_all+0x54/0x80
> [ 0.746136] [<ffffffff8d624fe0>] cirrus_fbdev_init+0xa0/0xb0
> [ 0.747033] [<ffffffff8d6245fb>] cirrus_modeset_init+0x18b/0x1e0
> .. snip snip ..
> [ 0.771023] Code: e8 ba e1 ff ff 48 8b 55 b8 48 83 f8 01 83 5d c4 ff 48 8b 7d b8 48 8b 82 98 02 00 00 49 8b 57 08 48 8b 40 10 48 8b 92 00 0a 00 00 <48> 83 7a 18 00 74 09 48 85 c0 0f 84 53 01 00 00 ff d0 49 89 c3

This decodes to

0: e8 ba e1 ff ff callq ...
5: 48 8b 55 b8 mov -0x48(%rbp),%rdx
9: 48 83 f8 01 cmp $0x1,%rax
d: 83 5d c4 ff sbbl $0xffffffff,-0x3c(%rbp)
11: 48 8b 7d b8 mov -0x48(%rbp),%rdi
15: 48 8b 82 98 02 00 00 mov 0x298(%rdx),%rax
1c: 49 8b 57 08 mov 0x8(%r15),%rdx
20: 48 8b 40 10 mov 0x10(%rax),%rax
24: 48 8b 92 00 0a 00 00 mov 0xa00(%rdx),%rdx
2b:* 48 83 7a 18 00 cmpq $0x0,0x18(%rdx) <-- trapping instruction
30: 74 09 je 0x3b
32: 48 85 c0 test %rax,%rax
35: 0f 84 53 01 00 00 je 0x18e
3b: ff d0 callq *%rax
3d: 49 89 c3 mov %rax,%r11


and trying to find matching code I think the oops is from this:

if (fb_helper->dev->mode_config.funcs->atomic_commit &&
!connector_funcs->best_encoder)
encoder = drm_atomic_helper_best_encoder(connector);
else
encoder = connector_funcs->best_encoder(connector);

in particular, the trapping instruction is the load of "atomic_commit".

(The "callq *%rax" seems to be the "connector_funcs->best_encoder()" call)

So I *think* we have "fb_helper->dev->mode_config.funcs" being NULL.

But I might have gotten the code matching wrong. I don't have the same
compiler as Eric, so even when using his config my code generation
doesn't really match. But there is only one indirect call in that
function, which is that ->best_encoder() call, so it does seem to
match up.

Adding Boris Brezillon and Daniel Vetter to the cc, since assuming I'm
right, the main suspect is commit c61b93fe51b1 ("drm/atomic: Fix
remaining places where !funcs->best_encoder is valid").

Boris? Daniel?

Linus