Re: mmotm 2010-01-06-14-34 - crash in CPUfreq

From: Minchan Kim
Date: Mon Jan 11 2010 - 10:40:34 EST


I have experienced same problem in 2.6.33-rc3-mm1.
My machine can't boot.

I try to find this patch but failed.
So I try to understand this problem from below changelog.

I guess it was from assigning the policy inside the lock.
lock_policy_rwsem_write uses policy_cpu but it was not initialized
at that time.

so we meet BUG_ON(policy_cpu == -1);
Quick fix is that return -1 if we meet (policy_cpu == -1) in
lock_policy_rwsem_##mode but it's very ugly.

Totally I don't understand cpufreq's code.
I hope cpufreq guys solve this problem perfectly.

I attach my config

barrios@barrios-desktop:~/mmotm$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz
stepping : 6
cpu MHz : 1600.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow
bogomips : 3732.84
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

....

On Sat, Jan 9, 2010 at 7:21 AM, <Valdis.Kletnieks@xxxxxx> wrote:
> On Wed, 06 Jan 2010 14:34:36 PST, akpm@xxxxxxxxxxxxxxxxxxxx said:
>> The mm-of-the-moment snapshot 2010-01-06-14-34 has been uploaded to
>>
>> Â Âhttp://userweb.kernel.org/~akpm/mmotm/
>
> Seeing a 100% reproducible crash, bisected down to this commit:
> (Saw crash in -mmotm1210, but haven't had time/etc to bisect till now)
>
> 46aeb7430f79cb4d03e17fedd6399884ab3aa697 is the first bad commit
> commit 46aeb7430f79cb4d03e17fedd6399884ab3aa697
> Author: Thomas Renninger <trenn@xxxxxxx>
> Date: Â Mon Dec 14 11:44:11 2009 +0100
>
> Â Â[CPUFREQ] Fix race in cpufreq_update_policy()
>
> Â ÂAn update can come in asynchronous, e.g. through an acpi notify event/irq.
> Â ÂIf the policy is assigned before the locking and lock_policy_rwsem_write is
> Â Âheld by onlining/offlining CPU code, the info may be stale
> Â Â(and the policy freed) when the locked code gets entered.
>
> Â ÂFix that by assigning the policy inside the lock.
> Â ÂMichal added some cleanups to properly exit on failure cases -> thanks.
>
> Â ÂSigned-off-by: Thomas Renninger <trenn@xxxxxxx>
> Â ÂCC: Michal Schmidt <mschmidt@xxxxxxxxxx>
> Â ÂSigned-off-by: Dave Jones <davej@xxxxxxxxxx>
>
> Here's the crash:
>
> [ Â Â4.840569] iTCO_wdt: Found a ICH9M-E TCO device (Version=2, TCOBASE=0x1060)
> [ Â Â4.840832] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
> [ Â Â4.840963] iTCO_vendor_support: vendor-support=0
> [ Â Â4.841404] device-mapper: ioctl: 4.16.0-ioctl (2009-11-05) initialised: dm-devel@xxxxxxxxxx
> [ Â Â4.841551] EDAC MC: Ver: 2.1.0 Jan Â8 2010
> [ Â Â4.842049] ------------[ cut here ]------------
> [ Â Â4.842160] kernel BUG at drivers/cpufreq/cpufreq.c:88!
> [ Â Â4.842271] invalid opcode: 0000 [#1] PREEMPT SMP
> [ Â Â4.842643] last sysfs file:
> [ Â Â4.842750] CPU 1
> [ Â Â4.842941] Pid: 1, comm: swapper Not tainted 2.6.32-08667-g46aeb74 #15 0X564R/Latitude E6500
> [ Â Â4.843031] RIP: 0010:[<ffffffff813942f4>] Â[<ffffffff813942f4>] lock_policy_rwsem_write+0x43/0xad
> [ Â Â4.843126] RSP: 0000:ffff88011f87fd40 ÂEFLAGS: 00010202
> [ Â Â4.843126] RAX: 00000000000104b0 RBX: 0000000000000001 RCX: 000000000000003e
> [ Â Â4.843126] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff81ae1f88
> [ Â Â4.843126] RBP: ffff88011f87fd60 R08: 0000000000000000 R09: ffffffff81b04810
> [ Â Â4.843126] R10: ffffffff8103b7fa R11: ffffffff8184f00d R12: 00000000ffffffff
> [ Â Â4.843126] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
> [ Â Â4.843126] FS: Â0000000000000000(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
> [ Â Â4.843126] CS: Â0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ Â Â4.843126] CR2: 00000034478a8440 CR3: 0000000001a1c000 CR4: 00000000000406e0
> [ Â Â4.843126] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ Â Â4.843126] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ Â Â4.843126] Process swapper (pid: 1, threadinfo ffff88011f87e000, task ffff88011f87c040)
> [ Â Â4.843126] Stack:
> [ Â Â4.843126] Âffff88011f87c040 0000000000000000 0000000000000000 00000000ffffffea
> [ Â Â4.843126] <0> ffff88011f87feb0 ffffffff8139558c ffff88011f87fd90 ffffffff810641a8
> [ Â Â4.843126] <0> 0000000000000000 0000000000000000 ffff88011f87fe50 ffffffff81527c4c
> [ Â Â4.843126] Call Trace:
> [ Â Â4.843126] Â[<ffffffff8139558c>] cpufreq_update_policy+0x20/0xfc
> [ Â Â4.843126] Â[<ffffffff810641a8>] ? debug_mutex_free_waiter+0x2e/0x6a
> [ Â Â4.843126] Â[<ffffffff81527c4c>] ? __mutex_lock_common+0x58b/0x5aa
> [ Â Â4.843126] Â[<ffffffff8103b7fa>] ? cpu_maps_update_begin+0x12/0x14
> [ Â Â4.843126] Â[<ffffffff815291b0>] ? _raw_spin_unlock_irqrestore+0x72/0x80
> [ Â Â4.843126] Â[<ffffffff8103b7fa>] ? cpu_maps_update_begin+0x12/0x14
> [ Â Â4.843126] Â[<ffffffff81527478>] ? __mutex_unlock_slowpath+0x136/0x17b
> [ Â Â4.843126] Â[<ffffffff81066494>] ? trace_hardirqs_on_caller+0x118/0x13c
> [ Â Â4.843126] Â[<ffffffff810664c5>] ? trace_hardirqs_on+0xd/0xf
> [ Â Â4.843126] Â[<ffffffff815274ac>] ? __mutex_unlock_slowpath+0x16a/0x17b
> [ Â Â4.843126] Â[<ffffffff815274c6>] ? mutex_unlock+0x9/0xb
> [ Â Â4.843126] Â[<ffffffff8103b80c>] ? cpu_maps_update_done+0x10/0x12
> [ Â Â4.843126] Â[<ffffffff815113c7>] ? register_cpu_notifier+0x28/0x32
> [ Â Â4.843126] Â[<ffffffff81b48568>] cpufreq_stats_init+0x85/0xad
> [ Â Â4.843126] Â[<ffffffff81b484e3>] ? cpufreq_stats_init+0x0/0xad
> [ Â Â4.843126] Â[<ffffffff810001ef>] do_one_initcall+0x59/0x14e
> [ Â Â4.843126] Â[<ffffffff81b21667>] kernel_init+0x15f/0x1b5
> [ Â Â4.843126] Â[<ffffffff810031d4>] kernel_thread_helper+0x4/0x10
> [ Â Â4.843126] Â[<ffffffff81529ac0>] ? restore_args+0x0/0x30
> [ Â Â4.843126] Â[<ffffffff81b21508>] ? kernel_init+0x0/0x1b5
> [ Â Â4.843126] Â[<ffffffff810031d0>] ? kernel_thread_helper+0x0/0x10
> [ Â Â4.843126] Code: 88 1f ae 81 53 31 db 48 83 ec 08 48 8b 14 d5 60 46 b0 81 44 8b 24 10 41 83 fc ff 0f 94 c3 31 d2 89 de e8 f4 42 d0 ff 85 db 74 04 <0f> 0b eb fe 4d 63 e4 48 c7 c3 c0 04 01 00 48 89 df 4a 03 3c e5
> [ Â Â4.843126] RIP Â[<ffffffff813942f4>] lock_policy_rwsem_write+0x43/0xad
> [ Â Â4.843126] ÂRSP <ffff88011f87fd40>
> [ Â Â4.862279] ---[ end trace b837c2c8cb05be90 ]---
> [ Â Â4.862402] Kernel panic - not syncing: Attempted to kill init!
> [ Â Â4.862523] Pid: 1, comm: swapper Tainted: G Â Â ÂD Â Â2.6.32-08667-g46aeb74 #15
> [ Â Â4.862669] Call Trace:
> [ Â Â4.862792] Â[<ffffffff81526031>] panic+0x7f/0x13a
> [ Â Â4.862919] Â[<ffffffff81066392>] ? trace_hardirqs_on_caller+0x16/0x13c
> [ Â Â4.863049] Â[<ffffffff8103db1a>] do_exit+0xd7/0x8d6
> [ Â Â4.863161] Â[<ffffffff8103a328>] ? spin_unlock_irqrestore+0x9/0xb
> [ Â Â4.863288] Â[<ffffffff8103af22>] ? kmsg_dump+0x136/0x150
> [ Â Â4.863402] Â[<ffffffff8152a904>] oops_end+0x89/0x8e
> [ Â Â4.863520] Â[<ffffffff81005dd1>] die+0x55/0x5e
> [ Â Â4.863631] Â[<ffffffff8152a324>] do_trap+0x11c/0x12b
> [ Â Â4.863752] Â[<ffffffff81003ac9>] do_invalid_op+0x8f/0x98
> [ Â Â4.863865] Â[<ffffffff813942f4>] ? lock_policy_rwsem_write+0x43/0xad
> [ Â Â4.863980] Â[<ffffffff8152899f>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [ Â Â4.864104] Â[<ffffffff8103b7fa>] ? cpu_maps_update_begin+0x12/0x14
> [ Â Â4.864218] Â[<ffffffff81529af0>] ? irq_return+0x0/0x2
> [ Â Â4.864365] Â[<ffffffff81003055>] invalid_op+0x15/0x20
> [ Â Â4.864478] Â[<ffffffff8103b7fa>] ? cpu_maps_update_begin+0x12/0x14
> [ Â Â4.864597] Â[<ffffffff813942f4>] ? lock_policy_rwsem_write+0x43/0xad
> [ Â Â4.864711] Â[<ffffffff8139558c>] cpufreq_update_policy+0x20/0xfc
> [ Â Â4.864825] Â[<ffffffff810641a8>] ? debug_mutex_free_waiter+0x2e/0x6a
> [ Â Â4.864944] Â[<ffffffff81527c4c>] ? __mutex_lock_common+0x58b/0x5aa
> [ Â Â4.865066] Â[<ffffffff8103b7fa>] ? cpu_maps_update_begin+0x12/0x14
> [ Â Â4.865185] Â[<ffffffff815291b0>] ? _raw_spin_unlock_irqrestore+0x72/0x80
> [ Â Â4.865311] Â[<ffffffff8103b7fa>] ? cpu_maps_update_begin+0x12/0x14
> [ Â Â4.865434] Â[<ffffffff81527478>] ? __mutex_unlock_slowpath+0x136/0x17b
> [ Â Â4.865548] Â[<ffffffff81066494>] ? trace_hardirqs_on_caller+0x118/0x13c
> [ Â Â4.865662] Â[<ffffffff810664c5>] ? trace_hardirqs_on+0xd/0xf
> [ Â Â4.865782] Â[<ffffffff815274ac>] ? __mutex_unlock_slowpath+0x16a/0x17b
> [ Â Â4.865897] Â[<ffffffff815274c6>] ? mutex_unlock+0x9/0xb
> [ Â Â4.866037] Â[<ffffffff8103b80c>] ? cpu_maps_update_done+0x10/0x12
> [ Â Â4.866153] Â[<ffffffff815113c7>] ? register_cpu_notifier+0x28/0x32
> [ Â Â4.866278] Â[<ffffffff81b48568>] cpufreq_stats_init+0x85/0xad
> [ Â Â4.866391] Â[<ffffffff81b484e3>] ? cpufreq_stats_init+0x0/0xad
> [ Â Â4.866505] Â[<ffffffff810001ef>] do_one_initcall+0x59/0x14e
> [ Â Â4.866624] Â[<ffffffff81b21667>] kernel_init+0x15f/0x1b5
> [ Â Â4.866737] Â[<ffffffff810031d4>] kernel_thread_helper+0x4/0x10
> [ Â Â4.866860] Â[<ffffffff81529ac0>] ? restore_args+0x0/0x30
> [ Â Â4.866971] Â[<ffffffff81b21508>] ? kernel_init+0x0/0x1b5
> [ Â Â4.867099] Â[<ffffffff810031d0>] ? kernel_thread_helper+0x0/0x10
>
>



--
Kind regards,
Minchan Kim

Attachment: config.barrios
Description: Binary data