Re: [PATCH V2 0/7] sched/deadline: fix cpusets bandwidth accounting

From: Mathieu Poirier
Date: Mon Feb 05 2018 - 15:49:13 EST


On Fri, Feb 02, 2018 at 02:17:50PM +0100, Luca Abeni wrote:
> Hi Mathieu,
>
> On Thu, 1 Feb 2018 09:51:02 -0700
> Mathieu Poirier <mathieu.poirier@xxxxxxxxxx> wrote:
>
> > This is the follow-up patchset to [1] that attempt to fix a problem
> > reported by Steve Rostedt [2] where DL bandwidth accounting is not
> > recomputed after CPUset and CPU hotplug operations. When CPU hotplug and
> > some CUPset manipulation take place root domains are destroyed and new ones
> > created, loosing at the same time DL accounting information pertaining to
> > utilisation. Please see [1] for a full description of the approach.
>
> I do not know the cgroup / cpuset code too much, so I have no useful
> comments on your patches... But I think this patchset is a nice
> improvemnt respect to the current situation.
>
> [...]
> > A notable addition is patch 7/7 - it addresses a problem seen when hot
> > plugging out a CPU where a DL task is running (see changelog for full
> > details). The issue is unrelated to this patchset and will manifest
> > itself on a mainline kernel.
>
> I think I introduced this bug with my reclaiming patches, so I am
> interested.
> When a cpu is hot-plugged out, which code in the kernel is responsible
> for migrating the tasks that are executing on such CPU?

sched_cpu_deactivate()
cpuset_cpu_inactive()
cpuset_update_active_cpus()
cpuset_hotplug_workfn()
hotplug_update_tasks_legacy()
hotplug_update_tasks()
set_cpus_allowed_ptr()
__set_cpus_allowed_ptr()


> I was sure I
> was handling all the relevant codepaths, but this bug clearly shows
> that I was wrong.

I remember reviewing your patchset and I too thought you had tackled all the
cases. In function __set_cpus_allowed_ptr() you'll notice two cases are
handled, i.e the task is running or suspended. I suspect the former to be the
culprit but haven't investigated fully.

Regards,
Mathieu



>
>
> Thanks,
> Luca