Re: WARNING: at kernel/rcutree.c:1558 rcu_do_batch+0x386/0x3a0(),during CPU hotplug
From: Srivatsa S. Bhat
Date: Thu Sep 13 2012 - 04:35:50 EST
On 09/12/2012 06:06 PM, Srivatsa S. Bhat wrote:
> On 07/19/2012 10:45 PM, Paul E. McKenney wrote:
>> On Thu, Jul 19, 2012 at 05:39:30PM +0530, Srivatsa S. Bhat wrote:
>>> Hi Paul,
>>>
>>> While running a CPU hotplug stress test on v3.5-rc7+
>>> (mainline commit 8a7298b7805ab) I hit this warning.
>>> I haven't tried to debug this yet...
>>>
>>> Line number 1550 maps to:
>>>
>>> WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
>>>
>>> inside rcu_do_batch().
>>
>> Hello, Srivatsa,
>>
>> I believe that you need commit a16b7a69 (Prevent __call_rcu() from
>> invoking RCU core on offline CPUs), which is currently in -tip, queued
>> for 3.6. Please see below for the patch.
>>
>> Does this help?
>>
>
> Hi Paul,
>
> I am hitting the cpu_is_offline() warning in rcu_do_batch() (see 2 of the
> examples below) occasionally while testing CPU hotplug on Thomas' smp/hotplug
> branch in -tip. It does contain the commit that you had mentioned above.
>
I also hit some writeback related problems during some of these runs. But I was
not able to reproduce them after that occurrence. (Adding relevant people to CC.)
I hit the divide error shown below during the CPU hotplug test run, and the general
protection fault subsequently, while trying to shutdown the machine after the test.
Regards,
Srivatsa S. Bhat
[ 522.987310] SMP alternatives: switching to SMP code
[ 522.999101] smpboot: Booting Node 1 Processor 7 APIC 0x16
[ 524.083872] SMP alternatives: lockdep: fixing up alternatives
[ 524.090053] smpboot: Booting Node 0 Processor 8 APIC 0x1
[ 525.148720] SMP alternatives: lockdep: fixing up alternatives
[ 525.154970] smpboot: Booting Node 0 Processor 9 APIC 0x3
[ 526.024180] divide error: 0000 [#1] SMP
[ 526.028144] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse loop dm_mod iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm cdc_ether pcspkr usbnet shpchp pci_hotplug i2c_i801 i2c_core ioatdma mii crc32c_intel serio_raw microcode lpc_ich mfd_core i7core_edac bnx2 dca edac_core tpm_tis tpm sg tpm_bios rtc_cmos button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 526.028145] CPU 9
[ 526.028145] Pid: 2235, comm: flush-8:0 Not tainted 3.6.0-rc1-tglx-hotplug-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033
[ 526.028145] RIP: 0010:[<ffffffff811276f6>] [<ffffffff811276f6>] bdi_dirty_limit+0x66/0xc0
[ 526.028145] RSP: 0018:ffff8811530bfcc0 EFLAGS: 00010206
[ 526.028145] RAX: 0000000000b9877e RBX: 00000000001a8112 RCX: 28f5c28f5c28f5c3
[ 526.028145] RDX: 0000000000000000 RSI: 0000000000b9877e RDI: 0000000000000000
[ 526.028145] RBP: ffff8811530bfce0 R08: 0000000000000010 R09: 0000000000000000
[ 526.028145] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8808d4408e20
[ 526.028145] R13: ffff8808d4408e20 R14: ffff8808d44091a0 R15: 0000000000000000
[ 526.028145] FS: 0000000000000000(0000) GS:ffff8808ddd40000(0000) knlGS:0000000000000000
[ 526.028145] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 526.028145] CR2: 00007fa35dd4eb60 CR3: 0000000001a0c000 CR4: 00000000000007e0
[ 526.028145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 526.028145] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 526.028145] Process flush-8:0 (pid: 2235, threadinfo ffff8811530be000, task ffff88115315c5e0)
[ 526.028145] Stack:
[ 526.028145] 0400000000000000 0000000000000007 ffff8808d4408e20 ffffffffffffffee
[ 526.028145] ffff8811530bfd10 ffffffff811ae95c 0000000000350225 00000000001a8112
[ 526.028145] 0000000000000000 0000000000000002 ffff8811530bfdc0 ffffffff811b0620
[ 526.028145] Call Trace:
[ 526.209272] SMP alternatives: lockdep: fixing up alternatives
[ 526.209275] smpboot: Booting Node 0 Processor 10 APIC 0x5
[ 526.220012] [<ffffffff811ae95c>] over_bground_thresh+0x7c/0x90
[ 526.220012] [<ffffffff811b0620>] wb_do_writeback+0x170/0x310
[ 526.220012] [<ffffffff811b08eb>] bdi_writeback_thread+0x12b/0x420
[ 526.220012] [<ffffffff811b07c0>] ? wb_do_writeback+0x310/0x310
[ 526.220012] [<ffffffff8106deae>] kthread+0xde/0xf0
[ 526.220012] [<ffffffff814c6184>] kernel_thread_helper+0x4/0x10
[ 526.220012] [<ffffffff814bc1f0>] ? retint_restore_args+0x13/0x13
[ 526.220012] [<ffffffff8106ddd0>] ? __init_kthread_worker+0x70/0x70
[ 526.220012] [<ffffffff814c6180>] ? gs_change+0x13/0x13
[ 526.220012] Code: 28 5c 8f c2 f5 28 8b 7d e0 48 89 c6 48 0f af f3 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 e8 48 89 f0 <48> f7 f7 41 8b 94 24 74 02 00 00 48 0f af d3 48 89 c7 48 c1 ea
[ 526.220012] RIP [<ffffffff811276f6>] bdi_dirty_limit+0x66/0xc0
[ 526.220012] RSP <ffff8811530bfcc0>
[ 526.304469] ---[ end trace bcfc7ab74bdb11a5 ]---
[ 527.330948] SMP alternatives: lockdep: fixing up alternatives
----
[ 1941.614775] SMP alternatives: lockdep: fixing up alternatives
[ 1941.620614] smpboot: Booting Node 1 Processor 5 APIC 0x12
[ 1941.657424] SMP alternatives: lockdep: fixing up alternatives
[ 1941.663215] smpboot: Booting Node 1 Processor 6 APIC 0x14
[ 1992.721819] general protection fault: 0000 [#2] SMP
[ 1992.724844] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf fuse loop dm_mod iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm cdc_ether pcspkr usbnet shpchp pci_hotplug i2c_i801 i2c_core ioatdma mii crc32c_intel serio_raw microcode lpc_ich mfd_core i7core_edac bnx2 dca edac_core tpm_tis tpm sg tpm_bios rtc_cmos button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase
scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1992.726995] CPU 8
[ 1992.726995] Pid: 19654, comm: shutdown Tainted: G D 3.6.0-rc1-tglx-hotplug-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033
[ 1992.726995] RIP: 0010:[<ffffffff810843d7>] [<ffffffff810843d7>] try_to_wake_up+0x57/0x2f0
[ 1992.726995] RSP: 0018:ffff8808d47e5e58 EFLAGS: 00010002
[ 1992.726995] RAX: 6b6b6b6b6b6b6b6b RBX: 000000000000000f RCX: 000000006b6b6b6b
[ 1992.726995] RDX: 000000006b6c6b6b RSI: ffffffff817a7fbf RDI: ffff88115315cdd0
[ 1992.726995] RBP: ffff8808d47e5e98 R08: 0000000000000002 R09: 0000000000000001
[ 1992.726995] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88115315c5e0
[ 1992.726995] R13: 000000006b6b6b6b R14: ffff8808d44091a0 R15: 0000000000000000
[ 1992.726995] FS: 00007f32b1beb700(0000) GS:ffff8808ddd00000(0000) knlGS:0000000000000000
[ 1992.726995] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1992.726995] CR2: 00007f32b173d1a0 CR3: 0000001153bd0000 CR4: 00000000000007e0
[ 1992.726995] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1992.726995] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1992.726995] Process shutdown (pid: 19654, threadinfo ffff8808d47e4000, task ffff8808d4b045e0)
[ 1992.726995] Stack:
[ 1992.726995] 0000000000000246 ffff88115315cdd0 0000000000000286 0000000000000000
[ 1992.726995] ffff8808d4408e20 ffff8808d5433288 ffff8808d44091a0 00746c6168206d65
[ 1992.726995] ffff8808d47e5ea8 ffffffff810846a0 ffff8808d47e5ed8 ffffffff811adce1
[ 1992.726995] Call Trace:
[ 1992.726995] [<ffffffff810846a0>] wake_up_process+0x10/0x20
[ 1992.726995] [<ffffffff811adce1>] bdi_queue_work+0xd1/0x1f0
[ 1992.726995] [<ffffffff811ae7d9>] __bdi_start_writeback+0x79/0x160
[ 1992.726995] [<ffffffff811af1b0>] wakeup_flusher_threads+0x120/0x1e0
[ 1992.726995] [<ffffffff811af0ca>] ? wakeup_flusher_threads+0x3a/0x1e0
[ 1992.726995] [<ffffffff811b45b2>] sys_sync+0x22/0x90
[ 1992.726995] [<ffffffff814c4fb9>] system_call_fastpath+0x16/0x1b
[ 1992.726995] Code: 31 ed 48 89 c7 48 89 45 c8 e8 46 72 43 00 48 89 45 d0 49 8b 04 24 85 c3 0f 84 02 02 00 00 45 8b 6c 24 2c 49 8b 44 24 08 45 85 ed <44> 8b 70 18 74 75 8b 1d dd ea 9d 00 85 db 0f 85 2d 02 00 00 49
[ 1992.726995] RIP [<ffffffff810843d7>] try_to_wake_up+0x57/0x2f0
[ 1992.726995] RSP <ffff8808d47e5e58>
[ 1992.726995] ---[ end trace bcfc7ab74bdb11a6 ]---
[ 1992.726995] Kernel panic - not syncing: Fatal exception in interrupt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/