Re: [PATCH v3 5/5] cpusets, suspend: Save and restore cpusets duringsuspend/resume

From: Srivatsa S. Bhat
Date: Wed May 16 2012 - 04:21:40 EST


On 05/16/2012 04:02 AM, David Rientjes wrote:

> On Wed, 16 May 2012, Srivatsa S. Bhat wrote:
>
>>> I know root is special
>>> cased all over the cpuset code, but I think the real fix here is to figure
>>> out why it can't be left as a superset and then we end up doing nothing
>>> for s/r.
>>>
>>> I don't have a preference for cpu hotplug and whether cpuset.cpus = 1-3
>>> remains 1-3 when cpu 2 is offlined or not, I think it could be argued both
>>> ways, but I disagree with saving the cpumask, removing all suspended cpus,
>>> and then reinstating it for no reason.
>>>
>>
>> I think there is a valid reason behind doing that.
>>
>> Cpusets translates to sched domains in scheduler terms. So whenever you update
>> cpusets, the sched domains are updated. IOW, if you don't touch cpusets during
>> hotplug (suspend/resume case), you allow them to have offline cpus, meaning,
>> you allow sched domains to have offline cpus. Hence sched domains are rendered
>> stale.
>>
>
> It's not possible to update the sched domains for s/r to be a subset of
> cpuset.cpus?


Subset? See below..

(Btw, the above statement reminds me of a different idea I had long back
which I will write about in a separate mail.)

> It would be the same situation for a thread using
> sched_setaffinity() while bound to a cpuset with a superset of allowed
> nodes.


First of all, sched domains are built by looking at the cpusets' ->cpus_allowed
mask, not individual task's ->cpus_allowed mask. So we would gain nothing by
altering individual task's ->cpus_allowed mask, like what sched_setaffinity()
does.

On top of that, the "subset" argument wouldn't hold good in the s/r case.
sched_setaffinity() tries its best to keep the ->cpus_allowed mask of a task
as a subset of the ->cpus_allowed mask of the cpuset it belongs to.
But with s/r, that's not the case - it can very well become a disjoint set.
Consider a cpuset having cpuset.cpus = 1. What happens during suspend/resume
then? Going by your suggestion, the tasks in that cpuset will have
->cpus_allowed = 0,2-3 or some other combination not having cpu 1 when cpu 1
gets offlined. And it will keep getting changed into other things depending
on which phase of suspend/resume we are in.

IOW, ->cpus_allowed of the cpuset and ->cpus_allowed of the tasks belonging
to the cpuset can go totally out-of-sync, with no relationship like
subset/superset being preserved between them. Which is not the case with
sched_setaffinity(), where we always try to maintain a superset-subset
relationship between the two.

And in any case, altering individual task's ->cpus_allowed wouldn't buy us
anything, as I mentioned above.

> If you do that, there's no reason to alter cpuset.cpus at all and

> you don't need to carry another cpumask around for each cpuset.
>


Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/