Re: [PATCH v8 11/19] sched: Allow task CPU affinity to be restricted on asymmetric systems

From: Valentin Schneider
Date: Fri Jun 04 2021 - 13:12:39 EST


On 02/06/21 17:47, Will Deacon wrote:
> +static int restrict_cpus_allowed_ptr(struct task_struct *p,
> + struct cpumask *new_mask,
> + const struct cpumask *subset_mask)
> +{
> + struct rq_flags rf;
> + struct rq *rq;
> + int err;
> + struct cpumask *user_mask = NULL;
> +
> + if (!p->user_cpus_ptr) {
> + user_mask = kmalloc(cpumask_size(), GFP_KERNEL);
> +
> + if (!user_mask)
> + return -ENOMEM;
> + }
> +
> + rq = task_rq_lock(p, &rf);
> +
> + /*
> + * Forcefully restricting the affinity of a deadline task is
> + * likely to cause problems, so fail and noisily override the
> + * mask entirely.
> + */
> + if (task_has_dl_policy(p) && dl_bandwidth_enabled()) {
> + err = -EPERM;
> + goto err_unlock;
> + }
> +
> + if (!cpumask_and(new_mask, &p->cpus_mask, subset_mask)) {
> + err = -EINVAL;
> + goto err_unlock;
> + }
> +
> + /*
> + * We're about to butcher the task affinity, so keep track of what
> + * the user asked for in case we're able to restore it later on.
> + */
> + if (user_mask) {
> + cpumask_copy(user_mask, p->cpus_ptr);
> + p->user_cpus_ptr = user_mask;
> + }
> +

Shouldn't that be done before any of the bailouts above, so we can
potentially restore the mask even if we end up forcefully expanding the
affinity?

> + return __set_cpus_allowed_ptr_locked(p, new_mask, 0, rq, &rf);
> +
> +err_unlock:
> + task_rq_unlock(rq, p, &rf);
> + kfree(user_mask);
> + return err;
> +}
> +
> +/*
> + * Restrict the CPU affinity of task @p so that it is a subset of
> + * task_cpu_possible_mask() and point @p->user_cpu_ptr to a copy of the
> + * old affinity mask. If the resulting mask is empty, we warn and walk
> + * up the cpuset hierarchy until we find a suitable mask.
> + */
> +void force_compatible_cpus_allowed_ptr(struct task_struct *p)
> +{
> + cpumask_var_t new_mask;
> + const struct cpumask *override_mask = task_cpu_possible_mask(p);
> +
> + alloc_cpumask_var(&new_mask, GFP_KERNEL);
> +
> + /*
> + * __migrate_task() can fail silently in the face of concurrent
> + * offlining of the chosen destination CPU, so take the hotplug
> + * lock to ensure that the migration succeeds.
> + */
> + cpus_read_lock();

I'm thinking this might not be required with:

http://lore.kernel.org/r/20210526205751.842360-3-valentin.schneider@xxxxxxx

but then again this isn't merged yet :-)

> + if (!cpumask_available(new_mask))
> + goto out_set_mask;
> +
> + if (!restrict_cpus_allowed_ptr(p, new_mask, override_mask))
> + goto out_free_mask;
> +
> + /*
> + * We failed to find a valid subset of the affinity mask for the
> + * task, so override it based on its cpuset hierarchy.
> + */
> + cpuset_cpus_allowed(p, new_mask);
> + override_mask = new_mask;
> +
> +out_set_mask:
> + if (printk_ratelimit()) {
> + printk_deferred("Overriding affinity for process %d (%s) to CPUs %*pbl\n",
> + task_pid_nr(p), p->comm,
> + cpumask_pr_args(override_mask));
> + }
> +
> + WARN_ON(set_cpus_allowed_ptr(p, override_mask));
> +out_free_mask:
> + cpus_read_unlock();
> + free_cpumask_var(new_mask);
> +}
> +
> +static int
> +__sched_setaffinity(struct task_struct *p, const struct cpumask *mask);
> +
> +/*
> + * Restore the affinity of a task @p which was previously restricted by a
> + * call to force_compatible_cpus_allowed_ptr(). This will clear (and free)
> + * @p->user_cpus_ptr.
> + */
> +void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
> +{
> + unsigned long flags;
> + struct cpumask *mask = p->user_cpus_ptr;
> +
> + /*
> + * Try to restore the old affinity mask. If this fails, then
> + * we free the mask explicitly to avoid it being inherited across
> + * a subsequent fork().
> + */
> + if (!mask || !__sched_setaffinity(p, mask))
> + return;
> +
> + raw_spin_lock_irqsave(&p->pi_lock, flags);
> + release_user_cpus_ptr(p);
> + raw_spin_unlock_irqrestore(&p->pi_lock, flags);

AFAICT an affinity change can happen between __sched_setaffinity() and
reacquiring the ->pi_lock. Right now this can't be another
force_compatible_cpus_allowed_ptr() because this is only driven by
arch_setup_new_exec() against current, so we should be fine, but here be
dragons.