Re: [PATCH 4/4 V5] workqueue: Allow modifying low level unbound workqueue cpumask

From: Lai Jiangshan
Date: Wed Apr 01 2015 - 04:33:01 EST

Next message: Marcin Wojtas: "Re: [PATCH 3/5] ARM: mvebu: Allow using the GIC for wakeup in standby mode"
Previous message: Kamezawa Hiroyuki: "Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed"
Next in thread: Tejun Heo: "Re: [PATCH 4/4 V5] workqueue: Allow modifying low level unbound workqueue cpumask"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi, Frederic, TJ

I considered a special case and forgot to consider an another case.

====

Let @L = the low level unbound workqueue cpumask.
Let @U = the user setting cpumask (wq->unbound_attrs->cpumask).

Thus the pwqs in the specified wq are controlled by @L & @U (& = cpmask_and()).
But the per-node pwqs are mandatory controlled by @L & @U, we don't discus it.
So this mail only takes focus on the default pwq.

What happens for the default pwq when @L & @U == empty cpumask?
I considered this case in the patch. In my patch, the dfl_pwq directly
use the @U. The reasons:
1) it is not a good idea to directly use cpu_possible_mask.
2) and it is not a good idea either to use @L & @U ( = empty), in this case,
the scheduler will use cpu_possible_mask.
3) so we has to chose one from @L and @U.
4) I chose @U instead of @L.
@L: it is low level global cpumask and it controls *ALL* wqs.
@U: it is set by the *USER*, it controls only one *SPECIFIC* wq.

Frederic, TJ, both you didn't say anything about my this early quick decision.
Should this case be handled specially? And if yes, does this decision met your requirements?

======

A comment from TJ reminded me that the final cpumask determined by the scheduler
is more important.

Let @O = cpu_online_mask.

The missing case:
(@L & @U) is not empty but (@L & @U @O) is empty.

In my old code (V5 patchset), the dfl_pwq uses (@L & @U), the scheduler will
use cpu_possible_mask instead due to there is no cpu onlined among all cpu in (@L & @U).
It is bad, the pwq is NOT controlled by @L nor @U now.

I think we may use @U for the dfl_pwq in this case. But it will introduces
a problem:

When (@L & @U) has online cpu, the dfl_pwq's cpumaks is (@L & @U).
when (@L & @U) has no online cpu, the dfl_pwq's cpumask is @U.
It means dfl_pwq may need to be reallocated during the cpuhotplug-add/remove
and it means wq_update_unbound_numa() can fail.

Frederic, TJ, any comments about this case?
TJ, would you like to make wq_update_unbound_numa() be failure-able?

thanks
Lai

On 03/31/2015 03:46 PM, Lai Jiangshan wrote:
> On 03/25/2015 01:31 AM, Tejun Heo wrote:
>> On Wed, Mar 18, 2015 at 12:40:17PM +0800, Lai Jiangshan wrote:
>>> The oreder-workquue is ignore from the low level unbound workqueue cpumask,
>>> it will be handled in near future.
>>
>> Ugh, right, ordered workqueues are tricky. Maybe we should change how
>> ordered workqueues are implemented. Just gate work items at the
>> workqueue layer instead of fiddling with max_active and the number of
>> pwqs.
>>
>>> static struct wq_unbound_install_ctx *
>>> wq_unbound_install_ctx_prepare(struct workqueue_struct *wq,
>>> - const struct workqueue_attrs *attrs)
>>> + const struct workqueue_attrs *attrs,
>>> + cpumask_var_t unbound_cpumask)
>>> {
>> ...
>>> /* make a copy of @attrs and sanitize it */
>>> copy_workqueue_attrs(new_attrs, attrs);
>>> - cpumask_and(new_attrs->cpumask, new_attrs->cpumask, wq_unbound_cpumask);
>>> + copy_workqueue_attrs(pwq_attrs, attrs);
>>> + cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask);
>>> + cpumask_and(pwq_attrs->cpumask, pwq_attrs->cpumask, unbound_cpumask);
>>
>> Hmmm... we weren't checking whether the intersection becomes null
>> before.
>
> Di you refer to the unquoted following code "cpumask_empty(pwq_attrs->cpumask)"?
>
> It is explained in the changelog and the comments.
>
>> Why are we doing it now? Note that this doesn't really make
>> things water-tight as cpu on/offlining can still leave the mask w/o
>> any online cpus. Shouldn't we just let the scheduler handle it as
>> before?
>
> Did you refer to "cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask);"?
>
> new_attrs will be copied to wq->unbound_attrs, so we hope it is sanity.
> the same code before this patchset did the same work.
>
> And it maybe be used for default pwq, and it can reduce the pool creation:
> cpu_possible_mask = 0-7
> wq_unbound_cpumask = 0-3
> user1 try to set wq1: attrs->cpumask = 4-9
> user2 try to set wq2: attrs->cpumask = 4-11
> thus both wq1 and wq2's default pwq's pool is the same pool. (pool's cpumask = 4-7)
>
>
>>
>>> @@ -3712,6 +3726,9 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
>>> * wq's, the default pwq should be used.
>>> */
>>> if (wq_calc_node_cpumask(wq->unbound_attrs, node, cpu_off, cpumask)) {
>>> + cpumask_and(cpumask, cpumask, wq_unbound_cpumask);
>>> + if (cpumask_empty(cpumask))
>>> + goto use_dfl_pwq;
>>
>> So, this special handling is necessary only because we did special in
>> the above for dfl_pwq. Why do we need these?
>
> wq->unbound_attrs is user setting attrs, its cpumask is not controlled by
> wq_unbound_cpumask. so we need these cpumask_and().
>
> Another question:
> Why wq->unbound_attrs' cpumask is not controlled by wq_unbound_cpumask?
>
> I hope the wq->unbound_attrs is always as the same as the user's last setting,
> regardless how much times the wq_unbound_cpumask is changed.
>
>>
>>> +static int unbounds_cpumask_apply(cpumask_var_t cpumask)
>>> +{
>> ..
>>> + list_for_each_entry_safe(ctx, n, &ctxs, list) {
>>> + if (ret >= 0)
>>
>> Let's do !ret.
>>
>>> + wq_unbound_install_ctx_commit(ctx);
>>> + wq_unbound_install_ctx_free(ctx);
>>> + }
>> ...
>>> +/**
>>> + * workqueue_unbounds_cpumask_set - Set the low-level unbound cpumask
>>> + * @cpumask: the cpumask to set
>>> + *
>>> + * The low-level workqueues cpumask is a global cpumask that limits
>>> + * the affinity of all unbound workqueues. This function check the @cpumask
>>> + * and apply it to all unbound workqueues and updates all pwqs of them.
>>> + * When all succeed, it saves @cpumask to the global low-level unbound
>>> + * cpumask.
>>> + *
>>> + * Retun: 0 - Success
>>> + * -EINVAL - No online cpu in the @cpumask
>>> + * -ENOMEM - Failed to allocate memory for attrs or pwqs.
>>> + */
>>> +int workqueue_unbounds_cpumask_set(cpumask_var_t cpumask)
>>> +{
>>> + int ret = -EINVAL;
>>> +
>>> + get_online_cpus();
>>> + cpumask_and(cpumask, cpumask, cpu_possible_mask);
>>> + if (cpumask_intersects(cpumask, cpu_online_mask)) {
>>
>> Does this make sense? We can't prevent cpus going down right after
>> the mask is set. What's the point of preventing empty config if we
>> can't prevent transitions into it and have to handle it anyway?
>
> Like set_cpus_allowed_ptr(). The cpumask must be valid when setting,
> although it can be transited into non-intersection later.
>
> This code is originated from Frederic. Maybe he has some stronger reason.
>
>>
>>> +static ssize_t unbounds_cpumask_store(struct device *dev,
>>> + struct device_attribute *attr,
>>> + const char *buf, size_t count)
>>
>> Naming is too confusing. Please pick a name which clearly
>> distinguishes per-wq and global masking.
>
> What about these names?
> wq_unbound_cpumask ==> wq_unbound_global_cpumask
> workqueue_unbounds_cpumask_set() ==> workqueue_set_unbound_global_cpumask(). (public API)
> unbounds_cpumask_store() ==> wq_store_unbound_global_cpumask() (static function for sysfs)
>
>>
>> Thanks.
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Marcin Wojtas: "Re: [PATCH 3/5] ARM: mvebu: Allow using the GIC for wakeup in standby mode"
Previous message: Kamezawa Hiroyuki: "Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed"
Next in thread: Tejun Heo: "Re: [PATCH 4/4 V5] workqueue: Allow modifying low level unbound workqueue cpumask"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]