Re: [PATCH 1/3] sched/rt: cpupri_find: implement fallback mechanism for !fit case
From: Valentin Schneider
Date:  Mon Feb 17 2020 - 12:07:48 EST
On 2/14/20 4:39 PM, Qais Yousef wrote:
> When searching for the best lowest_mask with a fitness_fn passed, make
> sure we record the lowest_level that returns a valid lowest_mask so that
> we can use that as a fallback in case we fail to find a fitting CPU at
> all levels.
> 
> The intention in the original patch was not to allow a down migration to
> unfitting CPU. But this missed the case where we are already running on
> unfitting one.
> 
> With this change now RT tasks can still move between unfitting CPUs when
> they're already running on such CPU.
> 
> And as Steve suggested; to adhere to the strict priority rules of RT, if
> a task is already running on a fitting CPU but due to priority it can't
> run on it, allow it to downmigrate to unfitting CPU so it can run.
> 
> Reported-by: Pavan Kondeti <pkondeti@xxxxxxxxxxxxxx>
> Signed-off-by: Qais Yousef <qais.yousef@xxxxxxx>
> ---
>  kernel/sched/cpupri.c | 157 +++++++++++++++++++++++++++---------------
>  1 file changed, 101 insertions(+), 56 deletions(-)
> 
> diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
> index 1a2719e1350a..1bcfa1995550 100644
> --- a/kernel/sched/cpupri.c
> +++ b/kernel/sched/cpupri.c
> @@ -41,6 +41,59 @@ static int convert_prio(int prio)
>  	return cpupri;
>  }
>  
> +static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
> +				struct cpumask *lowest_mask, int idx)
> +{
> +	struct cpupri_vec *vec  = &cp->pri_to_cpu[idx];
> +	int skip = 0;
> +
> +	if (!atomic_read(&(vec)->count))
> +		skip = 1;
> +	/*
> +	 * When looking at the vector, we need to read the counter,
> +	 * do a memory barrier, then read the mask.
> +	 *
> +	 * Note: This is still all racey, but we can deal with it.
> +	 *  Ideally, we only want to look at masks that are set.
> +	 *
> +	 *  If a mask is not set, then the only thing wrong is that we
> +	 *  did a little more work than necessary.
> +	 *
> +	 *  If we read a zero count but the mask is set, because of the
> +	 *  memory barriers, that can only happen when the highest prio
> +	 *  task for a run queue has left the run queue, in which case,
> +	 *  it will be followed by a pull. If the task we are processing
> +	 *  fails to find a proper place to go, that pull request will
> +	 *  pull this task if the run queue is running at a lower
> +	 *  priority.
> +	 */
> +	smp_rmb();
> +
> +	/* Need to do the rmb for every iteration */
> +	if (skip)
> +		return 0;
> +
> +	if (cpumask_any_and(p->cpus_ptr, vec->mask) >= nr_cpu_ids)
> +		return 0;
> +
> +	if (lowest_mask) {
> +		cpumask_and(lowest_mask, p->cpus_ptr, vec->mask);
> +
> +		/*
> +		 * We have to ensure that we have at least one bit
> +		 * still set in the array, since the map could have
> +		 * been concurrently emptied between the first and
> +		 * second reads of vec->mask.  If we hit this
> +		 * condition, simply act as though we never hit this
> +		 * priority level and continue on.
> +		 */
> +		if (cpumask_empty(lowest_mask))
> +			return 0;
> +	}
> +
> +	return 1;
> +}
> +
Just a drive-by comment; could you split that code move into its own patch?
It'd make the history a bit easier to read IMO.