Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]

From: Jiri Slaby
Date: Fri Feb 26 2016 - 04:23:27 EST


On 02/26/2016, 09:56 AM, Jiri Slaby wrote:
>> I really don't see how it would happen here - that code doesn't look
>> particularly odd.

Funnily enough, this is what I got today, when booting 4.4.2 in qemu VM
on my host.

RIP crashing (ffffffff810f28d5) is action->dev_id dereference in
handle_irq_event_percpu. Look:
0xffffffff810f28d5 <+101>: mov 0x8(%rbx),%rsi
0xffffffff810f28d9 <+105>: mov %r12d,%edi
0xffffffff810f28dc <+108>: callq *(%rbx)
which is
trace_irq_handler_entry(irq, action);
res = action->handler(irq, action->dev_id);
trace_irq_handler_exit(irq, action, res);

Now, I feel a bit worried: crash involving percpu and trace together? I
have seen this pattern inlined in try_to_wake_up already (see
ffffffff810a54af in core.s [1]).

try_to_wake_up
-> ttwu_queue
-> ttwu_queue_remote
-> trace_sched_wake_idle_without_ipi
-> ttwu_stat ** CRASH somewhere here

So is this the same bug or not?

[1] http://labs.suse.cz/jslaby/bug-968218/

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810f28d5>] handle_irq_event_percpu+0x65/0x340
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.2-13.g19ca782-default #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by
qemu-project.org 04/01/2014
task: ffffffff81e12540 ti: ffffffff81e00000 task.ti: ffffffff81e00000
RIP: 0010:[<ffffffff810f28d5>] [<ffffffff810f28d5>]
handle_irq_event_percpu+0x65/0x340
RSP: 0018:ffff880093e03d88 EFLAGS: 00010002
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000000000000000f
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046
RBP: ffff880093e03dc8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000004
R13: ffff880087c3b058 R14: 0000000000000000 R15: ffffffff81e03df8
FS: 0000000000000000(0000) GS:ffff880093e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000008a790000 CR4: 00000000000006f0
Stack:
ffff880087c3b000 0000000087c3b0d8 ffff880087c3b058 ffff880087c3b000
ffff880087c3b0d8 ffff880087c3b058 0000000000000034 ffffffff81e03df8
ffff880093e03df0 ffffffff810f2bec ffff880087c3b000 ffff880087c3b0d8
Call Trace:
[<ffffffff810f2bec>] handle_irq_event+0x3c/0x60
[<ffffffff810f5f60>] handle_edge_irq+0x80/0x150
[<ffffffff8101f49d>] handle_irq+0x1d/0x30
[<ffffffff81751ac1>] do_IRQ+0x61/0x120
[<ffffffff8174f80c>] common_interrupt+0x8c/0x8c
Full inexact backtrace again:

<IRQ>
[<ffffffff810f2bec>] handle_irq_event+0x3c/0x60
[<ffffffff810f5f60>] handle_edge_irq+0x80/0x150
[<ffffffff8101f49d>] handle_irq+0x1d/0x30
[<ffffffff81751ac1>] do_IRQ+0x61/0x120
[<ffffffff8174f80c>] common_interrupt+0x8c/0x8c
[<ffffffff8108bae7>] ? __do_softirq+0xa7/0x470
[<ffffffff8108bae0>] ? __do_softirq+0xa0/0x470
[<ffffffff8108c053>] irq_exit+0xb3/0xc0
[<ffffffff81751bc2>] smp_apic_timer_interrupt+0x42/0x50
[<ffffffff8174fb9c>] apic_timer_interrupt+0x8c/0xa0
<EOI>
[<ffffffff81067c96>] ? native_safe_halt+0x6/0x10
[<ffffffff810dcaed>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81027753>] default_idle+0x23/0x170
[<ffffffff8102808f>] arch_cpu_idle+0xf/0x20
[<ffffffff810d270a>] default_idle_call+0x2a/0x40
[<ffffffff810d2b07>] cpu_startup_entry+0x387/0x400
[<ffffffff8173fef6>] rest_init+0x136/0x140
[<ffffffff81f59fe3>] start_kernel+0x499/0x4a6
[<ffffffff81f59120>] ? early_idt_handler_array+0x120/0x120
[<ffffffff81f59339>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff81f59476>] x86_64_start_kernel+0x13b/0x14a
Code: 7e 48 8b 05 5e 58 e2 00 e8 79 8e 00 00 85 c0 74 0d 80 3d 54 3a e2
00 00 0f 84 db 01 00 00 65 ff 0d 01 96 f1 7e 0f 84 89 01 00 00 <48> 8b
73 08 44 89 e7 ff 13 41 89 c5 0f 1f 44 00 00 65 ff 05 e3
RIP [<ffffffff810f28d5>] handle_irq_event_percpu+0x65/0x340
RSP <ffff880093e03d88>
CR2: 0000000000000008

thanks,
--
js
suse labs