Re: linux 4.2.4 rcu_sched rolls over and barfs after debugger exits

From: Jeffrey Merkey
Date: Mon Oct 26 2015 - 02:19:47 EST


I coded a workaround and tested it. seems to work.

Jeff

On 10/26/15, Jeffrey Merkey <jeffmerkey@xxxxxxxxx> wrote:
> I am calling these functions while polling the keyboard inside the
> debugger:
>
> touch_softlockup_watchdog();
> clocksource_touch_watchdog();
> touch_nmi_watchdog();
>
> Jeff
>
> On 10/25/15, Jeffrey Merkey <jeffmerkey@xxxxxxxxx> wrote:
>> After using the mdb kernel debugger then exiting, the rcu_sched, due
>> to its own internal timers, rolls over and crashes when it does not
>> get the timeout window it likes. Not caused by memory corruption,
>> just caused by the debugger holding the system suspended then when the
>> system is allowed to run rcu_sched rolls over and dies.
>>
>> There are several things happening here -- lots of bugs linus ...
>>
>> Jeff
>>
>> sysrq: SysRq : MDB
>> INFO: rcu_sched detected stalls on CPUs/tasks:
>> (detected by 0, t=41279 jiffies, g=14721, c=14720, q=5)
>> All QSes seen, last rcu_sched kthread activity 41279
>> (-165477--206756), jiffies_till_next_fqs=3, root ->qsmask 0x0
>> NetworkManager R running 0 1703 1 0x00000080
>> c0bb6a28 c046d763 c0a895d9 00000000 000006a7 00000001 00000080 f64c1140
>> c0b535c0 00003981 c04a5126 c0a823a8 c0b53a91 0000a13f fffd799b fffcd85c
>> 00000003 00000000 00000096 00000000 00003981 3b9aca00 00003981 00003980
>> Call Trace:
>> [<c046d763>] ? sched_show_task+0xb3/0x120
>> [<c04a5126>] ? print_other_cpu_stall+0x276/0x2c0
>> [<c04a52e0>] ? __rcu_pending+0x170/0x210
>> [<c04a632f>] ? rcu_check_callbacks+0xbf/0x1a0
>> [<c04a8f48>] ? update_process_times+0x28/0x50
>> [<c04ba943>] ? tick_sched_handle+0x33/0x70
>> [<c04baa97>] ? tick_sched_timer+0x47/0xa0
>> [<c04aaefa>] ? __remove_hrtimer+0x4a/0x90
>> [<c04ab656>] ? __run_hrtimer+0x66/0x180
>> [<c04baa50>] ? tick_nohz_handler+0xd0/0xd0
>> [<c055f5e5>] ? __vfs_read+0xc5/0xf0
>> [<c04ab7f8>] ? __hrtimer_run_queues+0x88/0xc0
>> [<c04ab995>] ? hrtimer_interrupt+0x85/0x170
>> [<c0436746>] ? local_apic_timer_interrupt+0x26/0x50
>> [<c0451655>] ? irq_enter+0x5/0x50
>> [<c043679b>] ? smp_apic_timer_interrupt+0x2b/0x50
>> [<c090468d>] ? apic_timer_interrupt+0x2d/0x34
>> [<c0900000>] ? firmware_map_add_hotplug+0x45/0x141
>> rcu_sched kthread starved for 41279 jiffies! g14721 c14720 f0x2
>> fuse init (API version 7.23)
>> blk_update_request: I/O error, dev fd0, sector 0
>> floppy: error -5 while reading block 0
>> blk_update_request: I/O error, dev fd0, sector 0
>> floppy: error -5 while reading block 0
>> sysrq: SysRq : MDB
>> INFO: rcu_sched detected stalls on CPUs/tasks:
>> (detected by 0, t=21939 jiffies, g=17972, c=17971, q=3)
>> All QSes seen, last rcu_sched kthread activity 21939
>> (-124010--145949), jiffies_till_next_fqs=3, root ->qsmask 0x0
>> rtkit-daemon R running 0 2878 1 0x00000080
>> c0bb6a28 c046d763 c0a895d9 00000000 00000b3e 00000001 00000080 f64c1140
>> c0b535c0 00004634 c04a5126 c0a823a8 c0b53a91 000055b3 fffe1b96 fffdc5e3
>> 00000003 00000000 00000086 00000000 00004634 f69ec5cc 00004634 00004633
>> Call Trace:
>> [<c046d763>] ? sched_show_task+0xb3/0x120
>> [<c04a5126>] ? print_other_cpu_stall+0x276/0x2c0
>> [<c04a52e0>] ? __rcu_pending+0x170/0x210
>> [<c04a632f>] ? rcu_check_callbacks+0xbf/0x1a0
>> [<c04a8f48>] ? update_process_times+0x28/0x50
>> [<c04ba943>] ? tick_sched_handle+0x33/0x70
>> [<c04baa97>] ? tick_sched_timer+0x47/0xa0
>> [<c04aaefa>] ? __remove_hrtimer+0x4a/0x90
>> [<c04ab656>] ? __run_hrtimer+0x66/0x180
>> [<c04baa50>] ? tick_nohz_handler+0xd0/0xd0
>> [<c083a719>] ? __kmalloc_reserve+0x29/0x80
>> [<c04ab7f8>] ? __hrtimer_run_queues+0x88/0xc0
>> [<c04ab995>] ? hrtimer_interrupt+0x85/0x170
>> [<c0486507>] ? __wake_up_common+0x47/0x70
>> [<c0436746>] ? local_apic_timer_interrupt+0x26/0x50
>> [<c0451655>] ? irq_enter+0x5/0x50
>> [<c043679b>] ? smp_apic_timer_interrupt+0x2b/0x50
>> [<c090468d>] ? apic_timer_interrupt+0x2d/0x34
>> [<c05689b0>] ? legitimize_path+0x50/0x50
>> [<c056b8e5>] ? lookup_fast+0x155/0x2d0
>> [<c0568fbd>] ? generic_permission+0xcd/0x100
>> [<c056ba9a>] ? walk_component+0x3a/0x1f0
>> [<c08334f5>] ? SYSC_sendto+0x125/0x150
>> [<c056d1a6>] ? path_lookupat+0x56/0xf0
>> [<c056d48b>] ? filename_lookup+0x8b/0x150
>> [<f9cd02c2>] ? nl80211_send_bss.clone.4+0xe2/0x490 [cfg80211]
>> [<c056946e>] ? getname_flags+0x3e/0x1b0
>> [<c056948d>] ? getname_flags+0x5d/0x1b0
>> [<c05641fe>] ? vfs_fstatat+0x4e/0xa0
>> [<c0564308>] ? vfs_stat+0x18/0x20
>> [<c056464a>] ? SyS_stat64+0x1a/0x40
>> [<c0834535>] ? SyS_socketcall+0x235/0x300
>> [<c04da94c>] ? __audit_syscall_entry+0x9c/0x100
>> [<c0903b48>] ? sysenter_do_call+0x12/0x12
>> rcu_sched kthread starved for 21939 jiffies! g17972 c17971 f0x2
>> [root@aya ~]#
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/