Re: sched: hang in migrate_swap
From: Michael wang
Date: Thu Apr 10 2014 - 03:00:16 EST
On 04/10/2014 11:31 AM, Sasha Levin wrote:
[snip]
>
> I'd like to re-open this issue. It seems that something broke and I'm
> now seeing the same issues that have gone away 2 months with this patch
> again.
A new mechanism has been designed to move the priority checking inside
idle_balance(), including Kirill who is the designer ;-)
Regards,
Michael Wang
>
> Stack trace is similar to before:
>
> [ 6004.990292] CPU: 20 PID: 26054 Comm: trinity-c58 Not tainted 3.14.0-next-20140409-sasha-00022-g984f7c5-dirty #385
> [ 6004.990292] task: ffff880375bb3000 ti: ffff88036058e000 task.ti: ffff88036058e000
> [ 6004.990292] RIP: generic_exec_single (kernel/smp.c:91 kernel/smp.c:175)
> [ 6004.990292] RSP: 0000:ffff88036058f978 EFLAGS: 00000202
> [ 6004.990292] RAX: ffff8802b71dec00 RBX: ffff88036058f978 RCX: ffff8802b71decd8
> [ 6004.990292] RDX: ffff8802b71d85c0 RSI: ffff88036058f978 RDI: ffff88036058f978
> [ 6004.990292] RBP: ffff88036058f9c8 R08: 0000000000000001 R09: ffffffffa70bc580
> [ 6004.990292] R10: ffff880375bb3000 R11: 0000000000000000 R12: 000000000000000c
> [ 6004.990292] R13: 0000000000000001 R14: ffff88036058fa20 R15: ffffffffa121f560
> [ 6004.990292] FS: 00007fe993fbd700(0000) GS:ffff880437000000(0000) knlGS:0000000000000000
> [ 6004.990292] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 6004.990292] CR2: 00007fffb56b0a18 CR3: 00000003755df000 CR4: 00000000000006a0
> [ 6004.990292] DR0: 0000000000695000 DR1: 0000000000695000 DR2: 0000000000000000
> [ 6004.990292] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> [ 6004.990292] Stack:
> [ 6004.990292] ffff88040513da18 ffffffffa121f560 ffff88036058fa20 0000000000000002
> [ 6004.990292] 000000000000000c 000000000000000c ffffffffa121f560 ffff88036058fa20
> [ 6004.990292] 0000000000000001 ffff880189fe3000 ffff88036058fa08 ffffffffa11ff7b2
> [ 6004.990292] Call Trace:
> [ 6004.990292] ? cpu_stop_queue_work (kernel/stop_machine.c:227)
> [ 6004.990292] ? cpu_stop_queue_work (kernel/stop_machine.c:227)
> [ 6004.990292] smp_call_function_single (kernel/smp.c:234 (discriminator 7))
> [ 6004.990292] ? lg_local_lock (kernel/locking/lglock.c:25)
> [ 6004.990292] stop_two_cpus (kernel/stop_machine.c:297)
> [ 6004.990292] ? retint_restore_args (arch/x86/kernel/entry_64.S:1040)
> [ 6004.990292] ? __stop_cpus (kernel/stop_machine.c:170)
> [ 6004.990292] ? __stop_cpus (kernel/stop_machine.c:170)
> [ 6004.990292] ? __migrate_swap_task (kernel/sched/core.c:1042)
> [ 6004.990292] migrate_swap (kernel/sched/core.c:1110)
> [ 6004.990292] task_numa_migrate (kernel/sched/fair.c:1321)
> [ 6004.990292] ? task_numa_migrate (kernel/sched/fair.c:1227)
> [ 6004.990292] ? sched_clock_cpu (kernel/sched/clock.c:311)
> [ 6004.990292] numa_migrate_preferred (kernel/sched/fair.c:1342)
> [ 6004.990292] task_numa_fault (kernel/sched/fair.c:1796)
> [ 6004.990292] __handle_mm_fault (mm/memory.c:3812 mm/memory.c:3812 mm/memory.c:3925)
> [ 6004.990292] ? __const_udelay (arch/x86/lib/delay.c:126)
> [ 6004.990292] ? __rcu_read_unlock (kernel/rcu/update.c:97)
> [ 6004.990292] handle_mm_fault (include/linux/memcontrol.h:147 mm/memory.c:3951)
> [ 6004.990292] __do_page_fault (arch/x86/mm/fault.c:1220)
> [ 6004.990292] ? vtime_account_user (kernel/sched/cputime.c:687)
> [ 6004.990292] ? get_parent_ip (kernel/sched/core.c:2472)
> [ 6004.990292] ? context_tracking_user_exit (include/linux/vtime.h:89 include/linux/jump_label.h:105 include/trace/events/context_tracking.h:47 kernel/context_tracking.c:178)
> [ 6004.990292] ? preempt_count_sub (kernel/sched/core.c:2527)
> [ 6004.990292] ? context_tracking_user_exit (kernel/context_tracking.c:182)
> [ 6004.990292] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [ 6004.990292] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2638 (discriminator 2))
> [ 6004.990292] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273)
> [ 6004.990292] do_async_page_fault (arch/x86/kernel/kvm.c:263)
> [ 6004.990292] async_page_fault (arch/x86/kernel/entry_64.S:1496)
> [ 6004.990292] Code: 44 89 e7 ff 15 70 2d c5 04 45 85 ed 75 0b 31 c0 eb 27 0f 1f 80 00 00 00 00 f6 43 18 01 74 ef 66 2e 0f 1f 84 00 00 00 00 00 f3 90 <f6> 43 18 01 75 f8 eb db 66 0f 1f 44 00 00 48 83 c4 28 5b 41 5c
>
>
> Thanks,
> Sasha
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/