Re: sched,numa: invalid memory access in account_entity_dequeue

From: Sasha Levin
Date: Tue May 06 2014 - 08:24:11 EST


On 05/06/2014 07:08 AM, Peter Zijlstra wrote:
> On Sat, May 03, 2014 at 09:16:00AM -0400, Sasha Levin wrote:
>> Hi all,
>>
>> While fuzzing with trinity inside a KVM tools guest running latest -next kernel I've stumbled on the following:
>>
>
> Cute.. not making sense.. :-)
>
>> [ 1796.591361] BUG: unable to handle kernel paging request at fffffffedf97f040 [ 1796.592665] IP: __cpu_to_node (arch/x86/mm/numa.c:777)
>
> I suppose you've scripted this addr2line -ie vmlinux for all addresses in this splat?

Yeah, I'm trying to get that script upstream (https://lkml.org/lkml/2014/3/29/1)
since it seems to simplify looking at stack traces.

>> [ 1796.593710] PGD 21e30067 PUD 0 [ 1796.594174] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 1796.594937] Dumping ftrace buffer: [ 1796.595678] (ftrace buffer empty) [ 1796.596329] Modules linked in: [ 1796.596733] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 3.15.0-rc3-next-20140502-sasha-00019-g5cb1c98 #431 [ 1796.598143] task: ffff8803345b8000 ti: ffff880035fc0000 task.ti: ffff880035fc0000 [ 1796.598975] RIP: __cpu_to_node (arch/x86/mm/numa.c:777) [ 1796.600093] RSP: 0018:ffff8800a6c03b88 EFLAGS: 00010046 [ 1796.600197] RAX: ffff8806e791a000 RBX: ffffffffe791a028 RCX: 0000000000000000 [ 1796.600197] RDX: 0000000000000001 RSI: ffff8806cdc68068 RDI: 00000000e791a028 [ 1796.600197] RBP: ffff8800a6c03b98 R08: ffff880496183078 R09: 00000000000151c6 [ 1796.600197] R10: 000000000000b731 R11: 0000000000000001 R12: ffff8801b4dd7840 [ 1796.600197] R13: 0000000000000000 R14: 000000000000001e R15: ffff8801b34ac1a0 [ 1796.600197] FS: 0000000000000000(0000) GS:ffff88!
00a6c00000
(0000) knlGS:0000000000000000 [ 1796.600197] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1796.600197] CR2: fffffffedf97f040 CR3: 0000000021e2d000 CR4: 00000000000006a0 [ 1796.610323] Stack: [ 1796.610323] 0000000000000000 ffff8801b34ac1a0 ffff8800a6c03bd8 ffffffff9d1a9646 [ 1796.610323] ffff8800a6c03bd8 ffff8806cdc68068 ffff8806cdc68068 ffff8801b34ac1a0 [ 1796.610323] 0000000000000000 000000000000b7db ffff8800a6c03c38 ffffffff9d1ae987 [ 1796.610323] Call Trace: [ 1796.610323] <IRQ> [ 1796.610323] account_entity_dequeue (kernel/sched/fair.c:859 kernel/sched/fair.c:2009) [ 1796.610323] dequeue_entity (kernel/sched/fair.c:2827) [ 1796.610323] dequeue_task_fair (kernel/sched/fair.c:3907 include/linux/jump_label.h:105 kernel/sched/fair.c:3041 kernel/sched/fair.c:3217 kernel/sched/fair.c:3915) [ 1796.610323] dequeue_task (kernel/sched/core.c:793) [ 1796.610323] deactivate_task (kernel/sched/core.c:809) [ 1796.610323] move_task (kernel/sched/fair.c:5032) [ 1796.610323] !
load_balan
ce (kernel/sched/fair.c:5305 kernel/sched/fair.c:6485) [ 1796.610323] ? debug_smp_processor_id (lib/smp_processor_id.c:57) [ 1796.610323] rebalance_domains (kernel/sched/fair.c:7032) [ 1796.610323] ? rebalance_domains (kernel/sched/fair.c:6975) [ 1796.610323] run_rebalance_domains (kernel/sched/fair.c:7105 kernel/sched/fair.c:7198) [ 1796.610323] __do_softirq (kernel/softirq.c:269 include/linux/jump_label.h:105 include/trace/events/irq.h:126 kernel/softirq.c:270) [ 1796.610323] ? irq_exit (include/linux/vtime.h:82 include/linux/vtime.h:121 kernel/softirq.c:384) [ 1796.610323] irq_exit (kernel/softirq.c:346 kernel/softirq.c:387) [ 1796.610323] scheduler_ipi (kernel/sched/core.c:1545) [ 1796.610323] smp_reschedule_interrupt (arch/x86/kernel/smp.c:266) [ 1796.610323] reschedule_interrupt (arch/x86/kernel/entry_64.S:1178) [ 1796.610323] <EOI> [ 1796.610323] ? native_safe_halt (arch/x86/include/asm/irqflags.h:50) [ 1796.610323] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)!
[ 1796.63
7135] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310) [ 1796.637135] arch_cpu_idle (arch/x86/kernel/process.c:302) [ 1796.637135] cpu_idle_loop (kernel/sched/idle.c:179 kernel/sched/idle.c:226) [ 1796.637135] cpu_startup_entry (??:?) [ 1796.637135] start_secondary (arch/x86/kernel/smpboot.c:267) [ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 <48> 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 f4 00 00 8b 04 10 48 83 c4
>
>
> Could you maybe also do the same with the Code? -- that is, script an auto-decode for it?
>
> Obviously scripts/decodecode doesn't actually work right anymore:
>
> # echo [ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 <48> 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 00 00 8b 04 10 48 83 c4 | ./scripts/decodecode -bash: syntax error near unexpected token `48'
>
> But if I remove the <> by hand I get:
>
> # echo [ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 48 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 00 00 8b 04 10 48 83 c4 | ./scripts/decodecode [ 1796.637135] Code: 3a ea 05 00 74 25 89 de 48 c7 c7 08 b4 6c a1 31 c0 e8 99 6c 45 03 e8 7c 39 46 03 48 8b 05 71 3a ea 05 8b 04 98 eb 16 0f 1f 40 00 48 8b 14 dd 00 ef 0a a3 48 c7 c0 d8 00 00 8b 04 10 48 83 c4 sed: -e expression #1, char 1: unknown command: `-'
>
> Code starting with the faulting instruction =========================================== 0: 3a ea cmp %dl,%ch 2: 05 00 74 25 89 add $0x89257400,%eax 7: de 48 c7 fimul -0x39(%rax) a: c7 (bad) b: 08 b4 6c a1 31 c0 e8 or %dh,-0x173fce5f(%rsp,%rbp,2) 12: 99 cltd 13: 6c insb (%dx),%es:(%rdi) 14: 45 03 e8 add %r8d,%r13d 17: 7c 39 jl 0x52 19: 46 03 48 8b rex.RX add -0x75(%rax),%r9d 1d: 05 71 3a ea 05 add $0x5ea3a71,%eax 22: 8b 04 98 mov (%rax,%rbx,4),%eax 25: eb 16 jmp 0x3d 27: 0f 1f 40 00 nopl 0x0(%rax) 2b: 48 8b 14 dd 00 ef 0a mov -0x5cf51100(,%rbx,8),%rdx 32: a3 33: 48 c7 c0 d8 00 00 8b mov $0xffffffff8b0000d8,%rax 3a: 04 10 add $0x10,%al 3c: 48 rex.W 3d!
: 83
.byte 0x83 3e: c4 .byte 0xc4
>
> And 2b is the offset where the <> was.

Sure, I can look into that.

> Anyway, the reason I did this was that I was hoping to find the cpu argument in one of the registers, but looking at your RBX value doesn't really help.
>
>
> If I compile this function with a defconfig based .config, I get something like:
>
> 00000000000000a0 <__cpu_to_node>: a0: 48 83 3d 00 00 00 00 cmpq $0x0,0x0(%rip) # a8 <__cpu_to_node+0x8> a7: 00 a8: 55 push %rbp a9: 48 89 e5 mov %rsp,%rbp ac: 53 push %rbx ad: 48 63 df movslq %edi,%rbx b0: 75 15 jne c7 <__cpu_to_node+0x27> b2: 48 8b 14 dd 00 00 00 mov 0x0(,%rbx,8),%rdx b9: 00 ba: 48 c7 c0 00 00 00 00 mov $0x0,%rax c1: 8b 04 10 mov (%rax,%rdx,1),%eax c4: 5b pop %rbx c5: 5d pop %rbp c6: c3 retq c7: 89 de mov %ebx,%esi c9: 48 c7 c7 00 00 00 00 mov $0x0,%rdi d0: 31 c0 xor %eax,%eax d2: e8 00 00 00 00 callq d7 <__cpu_to_node+0x37> d7: e8 00 00 00 00 callq dc <__cpu_to_node+0x3c> dc: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # e3 <__cpu_t!
o_node+0x4
3> e3: 8b 04 98 mov (%rax,%rbx,4),%eax e6: eb dc jmp c4 <__cpu_to_node+0x24> e8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) ef: 00
>
>
> And the b2 offset matches up fairly nicely, although the rest of the decode appears to be crap. Still no hints though.
>
> However, calling convention puts the first argument in EAX, and at b2 EAX should still contain the original value, however your RAX value is complete nonsense again :/
>
> Of course, the cpu argument being complete crap is a good reason for this to happen. Which would make thread_info::cpu of the task in question be complete crap.. and I'm not sure I can explain that either.
>
> la-la-la..
>

I haven't seen it happening again, so maybe an unrelated memory corruption?


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/