On 24/11/20 21:37, Reinette Chatre wrote:
On 11/22/2020 6:24 PM, Valentin Schneider wrote:
This is a small cleanup + a fix for a race I stumbled upon while staring at
resctrl stuff.
Briefly tested on a Xeon Gold 5120 (m2.xlarge.x86 on Equinix) by bouncing
a few tasks around control groups.
...
Thank you very much for taking this on. Unfortunately this race is one
of a few issues with the way in which tasks moving to a new resource
group is handled.
Other issues are:
1.
Until the queued work is run, the moved task runs with old (and even
invalid in the case when its original resource group has been removed)
closid and rmid.
For a userspace task, that queued work should be run as soon as possible
(& relevant). If said task is currently running, then task_work_add() will
lead to an IPI;
the other cases (task moving itself or not currently
running) are covered by the return to userspace path.
Kernel threads however are a prickly matter because they quite explicitly
don't have this return to userspace - they only run their task_work
callbacks on exit. So we currently have to wait for those kthreads to go
through a context switch to update the relevant register, but I don't
see any other alternative that wouldn't involve interrupting every other
CPU (the kthread could move between us triggering some remote work and its
previous CPU receiving the IPI).
2.
Work to update the PQR_ASSOC register is queued every time the user
writes a task id to the "tasks" file, even if the task already belongs
to the resource group and in addition to any other pending work for that
task. This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. This could result in
significant system resource waste, especially on tasks sleeping for a
long time.
Fenghua solved these issues by replacing the callback with a synchronous
update, similar to how tasks are currently moved when a resource group
is deleted. We plan to submit this work next week.
This new solution will make patch 1 and 2 of this series unnecessary. As
I understand it patch 3 would still be a welcome addition but would
require changes. As you prefer you could either submit patch 3 on its
own for the code as it is now and we will rework the task related
changes on top of that, or you could wait for the task related changes
to land first?
Please do Cc me on those - I'll re-evaluate the need for patch 3 then.