Re: [RFC PATCH 01/10] CPU hotplug: Introduce "stable" cpu onlinemask, for atomic hotplug readers
From: Michael Wang
Date: Tue Dec 04 2012 - 22:29:00 EST
On 12/05/2012 10:56 AM, Michael Wang wrote:
[...]
>>
>> I wonder about the cpu-online case. A typical caller might want to do:
>>
>>
>> /*
>> * Set each online CPU's "foo" to "bar"
>> */
>>
>> int global_bar;
>>
>> void set_cpu_foo(int bar)
>> {
>> get_online_cpus_stable_atomic();
>> global_bar = bar;
>> for_each_online_cpu_stable()
>> cpu->foo = bar;
>> put_online_cpus_stable_atomic()
>> }
>>
>> void_cpu_online_notifier_handler(void)
>> {
>> cpu->foo = global_bar;
>> }
Oh, forgive me for misunderstanding your question :(
In this case, we have to prevent hotplug happen, not just ensure the
online mask is correct.
Hmm..., we need more consideration.
Regards,
Michael Wang
>>
>> And I think that set_cpu_foo() would be buggy, because a CPU could come
>> online before global_bar was altered, and that newly-online CPU would
>> pick up the old value of `bar'.
>>
>> So what's the rule here? global_bar must be written before we run
>> get_online_cpus_stable_atomic()?
>>
>> Anyway, please have a think and spell all this out?
>
> That's right, actually this related to one question, should the hotplug
> happen during get_online and put_online?
>
> Answer will be YES according to old API which using mutex, the hotplug
> won't happen in critical section, but the cost is get_online() will
> block, which will kill the performance.
>
> So we designed this mechanism to do acceleration, but as you pointed
> out, although the result will never be wrong, but the 'stable' mask is
> not stable since it could be changed in critical section.
>
> And we have two solution.
>
> One is from Srivatsa, using 'read_lock' and 'write_lock', it will
> prevent hotplug happen just like the old rule, the cost is we need a
> global 'rw_lock' which perform bad on NUMA system, and no doubt,
> get_online() will block for short time when doing hotplug.
>
> Another is to maintain a per-cpu cache mask, this mask will only be
> updated in get_online(), and be used in critical section, then we will
> get a real stable mask, but one flaw is, on different cpu in critical
> section, online mask will be different.
>
> We will be appreciate if we could collect some comments on which one to
> be used in next version.
>
> Regards,
> Michael Wang
>
>>
>>> struct take_cpu_down_param {
>>> unsigned long mod;
>>> void *hcpu;
>>> @@ -246,7 +351,9 @@ struct take_cpu_down_param {
>>> static int __ref take_cpu_down(void *_param)
>>> {
>>> struct take_cpu_down_param *param = _param;
>>> - int err;
>>> + int err, cpu = (long)(param->hcpu);
>>> +
>>
>> Like this please:
>>
>> int err;
>> int cpu = (long)(param->hcpu);
>>
>>> + prepare_cpu_take_down(cpu);
>>>
>>> /* Ensure this CPU doesn't handle any more interrupts. */
>>> err = __cpu_disable();
>>>
>>> ...
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/