Re: [PATCH] sched/fair: Add null pointer check to pick_next_entity()
From: Rik van Riel
Date: Mon Apr 14 2025 - 15:59:44 EST
On Wed, 2025-04-02 at 10:22 +0200, Peter Zijlstra wrote:
>
> Please confirm what the reason for overflow is.
>
Running a large enough sample size has its benefits.
We have hit 3 out of the 4 warnings below.
The only one we did not hit is the cfs_rq->avg_load != avg_load
warning.
Most of the time we seem to hit the warnings from the
code that removes tasks from the runqueue, but we are
occasionally seeing it when adding tasks to the runqueue,
as well.
> +++ b/kernel/sched/fair.c
> @@ -620,12 +620,36 @@ static inline s64 entity_key(struct cfs_
> *
> * As measured, the max (key * weight) value was ~44 bits for a
> kernel build.
> */
> +
> +static void avg_vruntime_validate(struct cfs_rq *cfs_rq)
> +{
> + unsigned long load = 0;
> + s64 vruntime = 0;
> +
> + for (struct rb_node *node = rb_first_cached(&cfs_rq-
> >tasks_timeline);
> + node; node = rb_next(node)) {
> + struct sched_entity *se = __node_2_se(node);
> + unsigned long weight = scale_load_down(se-
> >load.weight);
> + s64 key = entity_key(cfs_rq, se);
> + /* vruntime += key * weight; */
> + WARN_ON_ONCE(__builtin_mul_overflow(key, weight,
> &key));
> + WARN_ON_ONCE(__builtin_add_overflow(vruntime, key,
> &vruntime));
> + load += weight;
> + }
> +
> + WARN_ON_ONCE(cfs_rq->avg_load != load);
> + WARN_ON_ONCE(cfs_rq->avg_vruntime != vruntime);
> +}
> +
> static void
> avg_vruntime_add(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> unsigned long weight = scale_load_down(se->load.weight);
> s64 key = entity_key(cfs_rq, se);
>
> + /* not yet added to tree */
> + avg_vruntime_validate(cfs_rq);
> +
> cfs_rq->avg_vruntime += key * weight;
> cfs_rq->avg_load += weight;
> }
> @@ -638,6 +662,9 @@ avg_vruntime_sub(struct cfs_rq *cfs_rq,
>
> cfs_rq->avg_vruntime -= key * weight;
> cfs_rq->avg_load -= weight;
> +
> + /* already removed from tree */
> + avg_vruntime_validate(cfs_rq);
> }
>
> static inline
>
--
All Rights Reversed.