Re: [PATCH] cpumask: fix lg_lock/br_lock.

From: Srivatsa S. Bhat
Date: Wed Feb 29 2012 - 06:13:13 EST


On 02/29/2012 02:47 PM, Ingo Molnar wrote:

>
> * Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx> wrote:
>
>> Hi Andrew,
>>
>> On 02/29/2012 02:57 AM, Andrew Morton wrote:
>>
>>> On Tue, 28 Feb 2012 09:43:59 +0100
>>> Ingo Molnar <mingo@xxxxxxx> wrote:
>>>
>>>> This patch should also probably go upstream through the
>>>> locking/lockdep tree? Mind sending it us once you think it's
>>>> ready?
>>>
>>> Oh goody, that means you own
>>> http://marc.info/?l=linux-kernel&m=131419353511653&w=2.
>>>
>>
>>
>> That bug got fixed sometime around Dec 2011. See commit e30e2fdf
>> (VFS: Fix race between CPU hotplug and lglocks)
>
> The lglocks code is still CPU-hotplug racy AFAICS, despite the
> ->cpu_lock complication:
>
> Consider a taken global lock on a CPU:
>
> CPU#1
> ...
> br_write_lock(vfsmount_lock);
>
> this takes the lock of all online CPUs: say CPU#1 and CPU#2. Now
> CPU#3 comes online and takes the read lock:


CPU#3 cannot come online! :-)

No new CPU can come online until that corresponding br_write_unlock()
is completed. That is because br_write_lock acquires &name##_cpu_lock
and only br_write_unlock will release it.
And, CPU_UP_PREPARE callback tries to acquire that very same spinlock,
and hence will keep spinning until br_write_unlock() is run. And hence,
the CPU#3 or any new CPU online for that matter will not complete until
br_write_unlock() is done.

It is of course debatable as to how good this design really is, but IMHO,
the lglocks code is not CPU-hotplug racy now..

Here is the link to the original discussion during the development of
that patch: thread.gmane.org/gmane.linux.file-systems/59750/

>
> CPU#3
> br_read_lock(vfsmount_lock);
>
> This will succeed while the br_write_lock() is still active,
> because CPU#1 has only taken the locks of CPU#1 and CPU#2.
>
> Crash!
>
> The proper fix would be for CPU-online to serialize with all
> known lglocks, via the notifier callback, i.e. to do something
> like this:
>
> case CPU_UP_PREPARE:
> for_each_online_cpu(cpu) {
> spin_lock(&name##_cpu_lock);
> spin_unlock(&name##_cpu_lock);
> }
> ...
>
> I.e. in essence do:
>
> case CPU_UP_PREPARE:
> name##_global_lock_online();
> name##_global_unlock_online();
>
> Another detail I noticed, this bit:
>
> register_hotcpu_notifier(&name##_lg_cpu_notifier); \
> get_online_cpus(); \
> for_each_online_cpu(i) \
> cpu_set(i, name##_cpus); \
> put_online_cpus(); \
>
> could be something simpler and loop-less, like:
>
> get_online_cpus();
> cpumask_copy(name##_cpus, cpu_online_mask);
> register_hotcpu_notifier(&name##_lg_cpu_notifier);
> put_online_cpus();
>


While the cpumask_copy is definitely better, we can't put the
register_hotcpu_notifier() within get/put_online_cpus() because it will
lead to ABBA deadlock with a newly initiated CPU Hotplug operation, the
2 locks involved being the cpu_add_remove_lock and the cpu_hotplug lock.

IOW, at the moment there is no "absolutely race-free way" way to do
CPU Hotplug callback registration. Some time ago, while going through the
asynchronous booting patch by Arjan [1] I had written up a patch to fix
that race because that race got transformed from "purely theoretical"
to "very real" with the async boot patch, as shown by the powerpc boot
failures [2].

But then I stopped short of posting that patch to the lists because I
started wondering how important that race would actually turn out to be,
in case the async booting design takes a totally different approach
altogether.. [And the reason why I didn't post it is also because it
would require lots of changes in many parts where CPU Hotplug registration
is done, and that wouldn't probably be justified (I don't know..) if the
race remained only theoretical, as it is now.]

[1]. http://thread.gmane.org/gmane.linux.kernel/1246209
[2]. https://lkml.org/lkml/2012/2/13/383

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/