Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
From: Valentin Schneider
Date: Mon Feb 17 2020 - 12:07:48 EST
On 2/14/20 4:39 PM, Qais Yousef wrote:
> When searching for the best lowest_mask with a fitness_fn passed, make
> sure we record the lowest_level that returns a valid lowest_mask so that
> we can use that as a fallback in case we fail to find a fitting CPU at
> all levels.
>
> The intention in the original patch was not to allow a down migration to
> unfitting CPU. But this missed the case where we are already running on
> unfitting one.
>
> With this change now RT tasks can still move between unfitting CPUs when
> they're already running on such CPU.
>
> And as Steve suggested; to adhere to the strict priority rules of RT, if
> a task is already running on a fitting CPU but due to priority it can't
> run on it, allow it to downmigrate to unfitting CPU so it can run.
>
> Reported-by: Pavan Kondeti <pkondeti@xxxxxxxxxxxxxx>
> Signed-off-by: Qais Yousef <qais.yousef@xxxxxxx>
> ---
> kernel/sched/cpupri.c | 157 +++++++++++++++++++++++++++---------------
> 1 file changed, 101 insertions(+), 56 deletions(-)
>
> diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
> index 1a2719e1350a..1bcfa1995550 100644
> --- a/kernel/sched/cpupri.c
> +++ b/kernel/sched/cpupri.c
> @@ -41,6 +41,59 @@ static int convert_prio(int prio)
> return cpupri;
> }
>
> +static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
> + struct cpumask *lowest_mask, int idx)
> +{
> + struct cpupri_vec *vec = &cp->pri_to_cpu[idx];
> + int skip = 0;
> +
> + if (!atomic_read(&(vec)->count))
> + skip = 1;
> + /*
> + * When looking at the vector, we need to read the counter,
> + * do a memory barrier, then read the mask.
> + *
> + * Note: This is still all racey, but we can deal with it.
> + * Ideally, we only want to look at masks that are set.
> + *
> + * If a mask is not set, then the only thing wrong is that we
> + * did a little more work than necessary.
> + *
> + * If we read a zero count but the mask is set, because of the
> + * memory barriers, that can only happen when the highest prio
> + * task for a run queue has left the run queue, in which case,
> + * it will be followed by a pull. If the task we are processing
> + * fails to find a proper place to go, that pull request will
> + * pull this task if the run queue is running at a lower
> + * priority.
> + */
> + smp_rmb();
> +
> + /* Need to do the rmb for every iteration */
> + if (skip)
> + return 0;
> +
> + if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids)
> + return 0;
> +
> + if (lowest_mask) {
> + cpumask_and(lowest_mask, p->cpus_ptr, vec->mask);
> +
> + /*
> + * We have to ensure that we have at least one bit
> + * still set in the array, since the map could have
> + * been concurrently emptied between the first and
> + * second reads of vec->mask. If we hit this
> + * condition, simply act as though we never hit this
> + * priority level and continue on.
> + */
> + if (cpumask_empty(lowest_mask))
> + return 0;
> + }
> +
> + return 1;
> +}
> +
Just a drive-by comment; could you split that code move into its own patch?
It'd make the history a bit easier to read IMO.