Re: [2.6.33-rc5] Weird deadlock when shutting down
From: AmÃrico Wang
Date: Sat Feb 20 2010 - 02:14:12 EST
On Fri, Feb 19, 2010 at 2:45 AM, Johannes Berg
<johannes@xxxxxxxxxxxxxxxx> wrote:
> On Thu, 2010-02-18 at 08:31 -0800, Linus Torvalds wrote:
>
>> > > halt/4071 is trying to acquire lock:
>> > > Â(s_active){++++.+}, at: [<c0000000001ef868>]
>> > > .sysfs_addrm_finish+0x58/0xc0
>> > >
>> > > but task is already holding lock:
>> > > Â(&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at:
>> [<c0000000004cd6ac>]
>> > > .lock_policy_rwsem_write+0x84/0xf4
>> > >
>> > > which lock already depends on the new lock.
>>
>> You don't have a full backtrace for these things?
>
> No, it deadlocks right there, unfortunately.
>
>> We've had lots of trouble with the cpu governors, and I suspect the
>> problem isn't new, but the lockdep warning is likely new (see commit
>> 846f99749ab68bbc7f75c74fec305de675b1a1bf: "sysfs: Add lockdep
>> annotations
>> for the sysfs active reference").
>>
>> So it is likely to be an old issue that (a) now gets warned about and
>> (b) might have had timing changes enough to trigger it.
>
> Well, it used to not deadlock and actually shut down the machine :) So
> in that sense it's definitely new. It might have printed a lockdep
> warning before, which you wouldn't normally see since the machine turns
> off right after this.
>
>> I suspect it is G5-specific (or specific to whatever CPU frequency
>> code
>> that gets used there), since I think we'd have had lots of reports if
>> this
>> happened on x86.
>
> Yeah, that's puzzling me as well.
>
Does my following untested patch help?
Signed-off-by: WANG Cong <xiyou.wangcong@xxxxxxxxx>
---------
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 67bc2ec..8438f03 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1113,6 +1113,8 @@ static int __cpufreq_remove_dev(struct sys_device *sys_dev)
unsigned int cpu = sys_dev->id;
unsigned long flags;
struct cpufreq_policy *data;
+ struct kobject *kobj;
+ struct completion *cmp;
#ifdef CONFIG_SMP
struct sys_device *cpu_sys_dev;
unsigned int j;
@@ -1192,19 +1194,22 @@ static int __cpufreq_remove_dev(struct sys_device *sys_dev)
if (cpufreq_driver->target)
__cpufreq_governor(data, CPUFREQ_GOV_STOP);
- kobject_put(&data->kobj);
+ kobj = &data->kobj;
+ cmp = &data->kobj_unregister;
+ unlock_policy_rwsem_write(cpu);
+ kobject_put(kobj);
/* we need to make sure that the underlying kobj is actually
* not referenced anymore by anybody before we proceed with
* unloading.
*/
dprintk("waiting for dropping of refcount\n");
- wait_for_completion(&data->kobj_unregister);
+ wait_for_completion(cmp);
dprintk("wait complete\n");
+ lock_policy_rwsem_write(cpu);
if (cpufreq_driver->exit)
cpufreq_driver->exit(data);
-
unlock_policy_rwsem_write(cpu);
free_cpumask_var(data->related_cpus);