On Thu, Jan 12, 2023 at 04:56:07PM +0000, Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote:
+static int drmcs_can_attach(struct cgroup_taskset *tset)
+{
+ int ret;
+
+ /*
+ * As processes are getting moved between groups we need to ensure
+ * both that the old group does not see a sudden downward jump in the
+ * GPU utilisation, and that the new group does not see a sudden jump
+ * up with all the GPU time clients belonging to the migrated process
+ * have accumulated.
+ *
+ * To achieve that we suspend the scanner until the migration is
+ * completed where the resume at the end ensures both groups start
+ * observing GPU utilisation from a reset state.
+ */
+
+ ret = mutex_lock_interruptible(&drmcg_mutex);
+ if (ret)
+ return ret;
+ start_suspend_scanning();
+ mutex_unlock(&drmcg_mutex);
+
+ finish_suspend_scanning();
Here's scanning suspension, communicated via
root_drmcs.scanning_suspended = true;
root_drmcs.suspended_period_us = root_drmcs.period_us;
root_drmcs.period_us = 0;
but I don't see those used in scan_worker() and the scanning traversal
can apparently run concurrently with a task migration.
[...]
+static bool
+__start_scanning(struct drm_cgroup_state *root, unsigned int period_us)
[...]
+ css_for_each_descendant_post(node, &root->css) {
[...]
+ active = drmcs_get_active_time_us(drmcs);
+ if (period_us && active > drmcs->prev_active_us)
+ drmcs->active_us += active - drmcs->prev_active_us;
+ drmcs->prev_active_us = active;
drmcs_get_active_time_us() could count a task's contribution here,
the task would migrate to a different drmcs,
and it'd be counted 2nd time.