Re: [BUG] local_softirq_pending storm

From: Michal Piotrowski
Date: Tue May 22 2007 - 16:00:41 EST


Michal Piotrowski napisaÅ(a):
> Hi Thomas,
>
> On 20/05/07, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> On Sun, 2007-05-20 at 02:53 +0530, Anant Nitya wrote:
>> > > 1 == TASK_INTERRUPTIBLE, so we know that ksoftirqd was not woken
>> up. At
>> > > least it is not a scheduler problem.
>> > >
>> > > I work out a more complex debug patch and pester you to test once I'm
>> > > done.
>> > No problem :)
>>
>> You asked for it :)
>>
>> Please patch 2.6.22-rc2 with
>>
>> http://tglx.de/projects/hrtimers/2.6.22-rc2/patch-2.6.22-rc2-hrt2.patch
>> and
>> http://www.tglx.de/private/tglx/ht-debug/tracer.diff
>>
>> Compile it with the config
>>
>> http://www.tglx.de/private/tglx/ht-debug/config.debug
>>
>> You should find something like:
>>
>> ( swapper-0 |#0): new 67173 us user-latency.
>>
>> along with the familiar "NOHZ ......" message in your log file.
>>
>> Once that happened please do:
>>
>> $ cat /proc/latency_trace >trace.txt
>>
>> compress it and send it to me along with the full dmesg output or put
>> both up to some place, where I can download it.
>>
>> Michal,
>>
>> IIRC you encountered the same P4/HT related wreckage. Can you do the
>> same ?
>
> Good news - I can't reproduce this bug. It's time to remove
> "Subject : 2.6.21-git4 BUG: soft lockup detected on CPU#1!
> References : http://lkml.org/lkml/2007/5/2/511
> Submitter : Michal Piotrowski <michal.k.k.piotrowski@xxxxxxxxx>
> Handled-By : Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Status : problem is being debugged"
> from KR.
>
> Bad news - I hit a bug in 2.6.22-rc2-hrt3. Bug symptoms:
> - X hangs (keyboard, mouse, sound etc.)
> - only Magic SysRq works
>
> http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.22-rc2-hrt3/hrt-config
>
> http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.22-rc2-hrt3/hrt-console.log
>
>
> Now I'm trying to apply hrtimers debug patch ontop 2.6.22-rc2-hrt3
> http://lkml.org/lkml/2007/3/23/106
>

It _almost_ works

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.22-rc2-hrt3/hrtimers_debug.patch

Hmmm..

[ 135.206505] SysRq : Show Pending Timers
[ 135.210476] Timer List Version: v0.3
[ 135.214102] HRTIMER_MAX_CLOCK_BASES: 2
[ 135.217899] now at 112955954742 nsecs
[ 135.221651]
[ 135.221662] cpu: 0
[ 135.225246] clock 0:
[ 135.227588] .index: 0
[ 135.230545] .resolution: 1 nsecs
[ 135.234010] .get_time: ktime_get_real
[ 135.238151] .offset: 1179862680097808665 nsecs
[ 135.243207] active timers:
[ 135.245992] clock 1:
[ 135.248316] .index: 1
[ 135.251246] .resolution: 1 nsecs
[ 135.254722] .get_time: ktime_get
[ 135.258473] .offset: 0 nsecs
[ 135.261923] active timers:
[ 135.264683] #0: hardirq_stack, tick_sched_timer, S:01, tick_nohz_restart_sched_tick, swapper/0
[ 135.273913] # expires at 112952000000 nsecs [in 18446744073705596874 nsecs]
[ 135.281194] #1: hardirq_stack, hrtimer_wakeup, S:01, futex_wait, automount/2293
[ 135.289057] # expires at 113902202172 nsecs [in 946247430 nsecs]
[ 135.295233] #2: hardirq_stack, hrtimer_wakeup, S:01, do_nanosleep, crond/2373
[ 135.302973] # expires at 121028637388 nsecs [in 8072682646 nsecs]
[ 135.309279] #3: hardirq_stack, it_real_fn, S:01, do_setitimer, syslogd/2171
[ 135.316701] # expires at 121465401717 nsecs [in 8509446975 nsecs]
[ 135.323035] .expires_next : 112952000000 nsecs
[ 135.327921] .exp_prev : 112951000000 nsecs
[ 135.332786] last expires_next stacktrace:
[ 135.336944] update_cpu_base_expires_next
[ 135.341188] hrtimer_interrupt
[ 135.344430] smp_apic_timer_interrupt
[ 135.348310] apic_timer_interrupt
[ 135.351845] __mcount
[ 135.354327] mcount
[ 135.356627] permission
[ 135.359298] vfs_permission
[ 135.362305] __link_path_walk
[ 135.365497] link_path_walk
[ 135.368486] path_walk
[ 135.371044] do_path_lookup
[ 135.374086] __user_walk_fd
[ 135.377122] vfs_stat_fd
[ 135.379905] vfs_stat
[ 135.382385] sys_stat64
[ 135.385065] syscall_call
[ 135.387909] <3>BUG: sleeping function called from invalid context at kernel/mutex.c:86
[ 135.396175] in_atomic():1, irqs_disabled():1
[ 135.400541] 2 locks held by readahead/2442:
[ 135.404783] #0: (&serio->lock){++..}, at: [<c02dbf17>] serio_interrupt+0x22/0x79
[ 135.412623] #1: (sysrq_key_table_lock){+...}, at: [<c027a4af>] __handle_sysrq+0x1b/0x10c
[ 135.421208] irq event stamp: 3249062
[ 135.424856] hardirqs last enabled at (3249061): [<c0181b67>] kmem_cache_free+0x97/0xa1
[ 135.433050] hardirqs last disabled at (3249062): [<c0104bd4>] common_interrupt+0x24/0x34
[ 135.441350] softirqs last enabled at (3248980): [<c012a418>] __do_softirq+0xe8/0xef
[ 135.449261] softirqs last disabled at (3248975): [<c0106df7>] do_softirq+0x69/0xd6
[ 135.457049] [<c010526c>] show_trace_log_lvl+0x35/0x54
[ 135.462312] [<c0106027>] show_trace+0x2c/0x2e
[ 135.466876] [<c01060d0>] dump_stack+0x29/0x2b
[ 135.471423] [<c011de0c>] __might_sleep+0xe0/0xe2
[ 135.476221] [<c0360257>] mutex_lock+0x22/0x2e
[ 135.480767] [<c014eb96>] lookup_module_symbol_name+0x1a/0xc4
[ 135.486739] [<c014f514>] lookup_symbol_name+0x62/0x69
[ 135.492021] [<c013eb45>] print_name_offset+0x29/0x81
[ 135.497200] [<c013f4c5>] timer_list_show+0x657/0xc55
[ 135.502368] [<c013fae2>] sysrq_timer_list_show+0x1f/0x21
[ 135.507859] [<c027a5fd>] sysrq_handle_show_timers+0xd/0xf
[ 135.513451] [<c027a525>] __handle_sysrq+0x91/0x10c
[ 135.518507] [<c027a79e>] handle_sysrq+0x37/0x39
[ 135.523296] [<c0274b00>] kbd_event+0x335/0x58d
[ 135.527921] [<c02df806>] input_event+0x43e/0x460
[ 135.532718] [<c02e3766>] atkbd_interrupt+0x467/0x58d
[ 135.537887] [<c02dbf37>] serio_interrupt+0x42/0x79
[ 135.542865] [<c02dceed>] i8042_interrupt+0x1ff/0x210
[ 135.548130] [<c0160963>] handle_IRQ_event+0x24/0x54
[ 135.553187] [<c0161b2a>] handle_edge_irq+0xd7/0x11f
[ 135.558253] [<c0106f46>] do_IRQ+0xe2/0x10c
[ 135.562544] [<c0104bde>] common_interrupt+0x2e/0x34
[ 135.567675] [<c0116134>] mcount+0x14/0x18
[ 135.571858] [<c019b9ab>] mntput_no_expire+0xd/0x89
[ 135.576845] [<c018cd6a>] path_release+0x2f/0x33
[ 135.581645] [<c01893f4>] vfs_stat_fd+0x4c/0x55
[ 135.586276] [<c01894c9>] vfs_stat+0x25/0x27
[ 135.590660] [<c01894e9>] sys_stat64+0x1e/0x37
[ 135.595215] [<c01041f5>] syscall_call+0x7/0xb
[ 135.599752] =======================
[ 135.603365] ---------------------------
[ 135.607229] | preempt count: 00010000 ]
[ 135.611146] | 0-level deep critical section nesting:
[ 135.616202] ----------------------------------------
[ 135.621258]
[ 135.795930] ( readahead-2442 |#0): new 112512428 us user-latency.
[ 135.802591] stopped custom tracer.
[ 135.806058] BUG: at kernel/mutex.c:132 __mutex_lock_common()
[ 135.811824] [<c010526c>] show_trace_log_lvl+0x35/0x54
[ 135.817035] [<c0106027>] show_trace+0x2c/0x2e
[ 135.821557] [<c01060d0>] dump_stack+0x29/0x2b
[ 135.826078] [<c035fff2>] __mutex_lock_slowpath+0x64/0x2a7
[ 135.831714] [<c036025e>] mutex_lock+0x29/0x2e
[ 135.836277] [<c014eb96>] lookup_module_symbol_name+0x1a/0xc4
[ 135.842121] [<c014f514>] lookup_symbol_name+0x62/0x69
[ 135.847342] [<c013eb45>] print_name_offset+0x29/0x81
[ 135.852478] [<c013f4c5>] timer_list_show+0x657/0xc55
[ 135.857611] [<c013fae2>] sysrq_timer_list_show+0x1f/0x21
[ 135.863128] [<c027a5fd>] sysrq_handle_show_timers+0xd/0xf
[ 135.868744] [<c027a525>] __handle_sysrq+0x91/0x10c
[ 135.873689] [<c027a79e>] handle_sysrq+0x37/0x39
[ 135.878383] [<c0274b00>] kbd_event+0x335/0x58d
[ 135.882991] [<c02df806>] input_event+0x43e/0x460
[ 135.887806] [<c02e3766>] atkbd_interrupt+0x467/0x58d
[ 135.892948] [<c02dbf37>] serio_interrupt+0x42/0x79
[ 135.897918] [<c02dceed>] i8042_interrupt+0x1ff/0x210
[ 135.903053] [<c0160963>] handle_IRQ_event+0x24/0x54
[ 135.908120] [<c0161b2a>] handle_edge_irq+0xd7/0x11f
[ 135.913176] [<c0106f46>] do_IRQ+0xe2/0x10c
[ 135.917449] [<c0104bde>] common_interrupt+0x2e/0x34
[ 135.922502] [<c0116134>] mcount+0x14/0x18
[ 135.926696] [<c019b9ab>] mntput_no_expire+0xd/0x89
[ 135.931657] [<c018cd6a>] path_release+0x2f/0x33
[ 135.936378] [<c01893f4>] vfs_stat_fd+0x4c/0x55
[ 135.941001] [<c01894c9>] vfs_stat+0x25/0x27
[ 135.945376] [<c01894e9>] sys_stat64+0x1e/0x37
[ 135.949931] [<c01041f5>] syscall_call+0x7/0xb
[ 135.954467] =======================
[ 135.958091] ---------------------------
[ 135.962015] | preempt count: 00010000 ]
[ 135.965905] | 0-level deep critical section nesting:
[ 135.970927] ----------------------------------------
[ 135.975931]
[ 135.977468] <ffffffff>
[ 135.979865]
[ 135.981396] .hres_active : 1
[ 135.984757] .nr_events : 91663
[ 135.988465] .nohz_mode : 2
[ 135.991820] .idle_tick : 112686000000 nsecs
[ 135.996651] .tick_stopped : 0
[ 136.000016] .idle_jiffies : 4294779982
[ 136.004190] .idle_calls : 30792
[ 136.007882] .idle_sleeps : 3020
[ 136.011486] .idle_entrytime : 112932690671 nsecs
[ 136.016335] .idle_sleeptime : 18007304053 nsecs
[ 136.021063] .last_jiffies : 4294780229
[ 136.025220] .next_jiffies : 4294780556
[ 136.029352] .idle_expires : 112855000000 nsecs
[ 136.034167] jiffies: 4294780248
[ 136.037331]
[ 136.037336] cpu: 1
[ 136.040884] clock 0:
[ 136.043182] .index: 0
[ 136.046087] .resolution: 1 nsecs
[ 136.049535] .get_time: ktime_get_real
[ 136.053739] .offset: 1179862680097808665 nsecs
[ 136.058751] active timers:
[ 136.061501] clock 1:
[ 136.063877] .index: 1
[ 136.066781] .resolution: 1 nsecs
[ 136.070229] .get_time: ktime_get
[ 136.073922] .offset: 0 nsecs
[ 136.077465] active timers:
[ 136.080293] #0: hardirq_stack, tick_sched_timer, S:01, tick_nohz_restart_sched_tick, swapper/0
[ 136.089309] # expires at 113828250000 nsecs [in 872295258 nsecs]
[ 136.095654] #1: hardirq_stack, hrtimer_wakeup, S:01, futex_wait, automount/2294
[ 136.103356] # expires at 113902196736 nsecs [in 946241994 nsecs]
[ 136.109485] #2: hardirq_stack, hrtimer_wakeup, S:01, do_nanosleep, smartd/2640
[ 136.117101] # expires at 1897330250564 nsecs [in 1784374295822 nsecs]
[ 136.123723] .expires_next : 113871250000 nsecs
[ 136.128726] .exp_prev : 113875250000 nsecs
[ 136.133558] last expires_next stacktrace:
[ 136.137662] update_cpu_base_expires_next
[ 136.141838] hrtimer_interrupt
[ 136.145071] smp_apic_timer_interrupt
[ 136.148935] apic_timer_interrupt
[ 136.152419] cpu_idle
[ 136.154890] start_secondary
[ 136.157967] <00000000>
[ 136.160639] <ffffffff>
[ 136.163267]
[ 136.164789] .hres_active : 1
[ 136.168143] .nr_events : 85123
[ 136.171981] .nohz_mode : 2
[ 136.175344] .idle_tick : 112682250000 nsecs
[ 136.180281] .tick_stopped : 0
[ 136.183737] .idle_jiffies : 4294779978
[ 136.187974] .idle_calls : 44643
[ 136.191708] .idle_sleeps : 1384
[ 136.195424] .idle_entrytime : 113942257992 nsecs
[ 136.200421] .idle_sleeptime : 18596870874 nsecs
[ 136.205158] .last_jiffies : 4294780248
[ 136.209316] .next_jiffies : 4294780261
[ 136.213595] .idle_expires : 112778000000 nsecs
[ 136.218451] jiffies: 4294780248
[ 136.221650]
[ 136.223163]
[ 136.223168] Tick Device: mode: 1
[ 136.228383] Clock Event Device: pit
[ 136.231928] max_delta_ns: 27461866
[ 136.235636] min_delta_ns: 12571
[ 136.239172] mult: 5124677
[ 136.242769] shift: 32
[ 136.245965] mode: 3
[ 136.249095] next_event: 9223372036854775807 nsecs
[ 136.254282] set_next_event: pit_next_event
[ 136.258613] set_mode: init_pit_timer
[ 136.262908] event_handler: tick_handle_oneshot_broadcast
[ 136.268510] tick_broadcast_mask: 00000000
[ 136.272583] tick_broadcast_oneshot_mask: 00000000
[ 136.277310]
[ 136.278840]
[ 136.278845] Tick Device: mode: 1
[ 136.284086] Clock Event Device: lapic
[ 136.287803] max_delta_ns: 671411430
[ 136.291615] min_delta_ns: 1200
[ 136.294977] mult: 53661274
[ 136.298668] shift: 32
[ 136.301902] mode: 3
[ 136.304971] next_event: 112952000000 nsecs
[ 136.309638] set_next_event: lapic_next_event
[ 136.314177] set_mode: lapic_timer_setup
[ 136.318680] event_handler: hrtimer_interrupt
[ 136.323236]
[ 136.323241] Tick Device: mode: 1
[ 136.328404] Clock Event Device: lapic
[ 136.332191] max_delta_ns: 671411430
[ 136.335977] min_delta_ns: 1200
[ 136.339367] mult: 53661274
[ 136.343075] shift: 32
[ 136.346268] mode: 3
[ 136.349375] next_event: 114098250000 nsecs
[ 136.353914] set_next_event: lapic_next_event
[ 136.358357] set_mode: lapic_timer_setup
[ 136.362973] event_handler: hrtimer_interrupt

Regards,
Michal

--
Michal K. K. Piotrowski
Kernel Monkeys
(http://kernel.wikidot.com/start)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/