Re: [PATCH 0/4] sched: Various reweight_entity() fixes

From: Shubhang Kaushik

Date: Tue Feb 17 2026 - 17:02:35 EST


Hi Prateek,

On Mon, 16 Feb 2026, K Prateek Nayak wrote:

Hello Shubhang,

On 2/14/2026 12:50 PM, Shubhang Kaushik wrote:
Hi Peter,

On Fri, 30 Jan 2026, Peter Zijlstra wrote:

Two issues related to reweight_entity() were raised; poking at all that got me
these patches.

They're in queue.git/sched/core and I spend most of yesterday staring at traces
trying to find anything wrong. So far, so good.

Please test.



I’m seeing a consistent NULL pointer dereference in pick_task_fair() when running hackbench on an Ampere Altra (80 cores arm64). This is happening after applying the complete patchset on the latest 6.19.0+ kernel with PREEMPT_DYNAMIC (full), CONFIG_SCHED_CLUSTER and NOHZ_FULL enabled.

Can you confirm you using the latest changes from:

git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core

at commit bdba3187771c ("sched/fair: Use full weight to __calc_delta()")
since this series has undergone some churn throughout the week?


Okay. I was not using commit bdba3187771c in my first test. After updating to that commit on sched/core, the issue is resolved on my machine.


The system triggers a level 2 translation fault because pick_eevdf() returns NULL despite the runqueue having active tasks (cfs_rq->nr_running
0). When pick_next_task_fair() attempts to dereference this NULL pointer
to access the task structure, the kernel Oopses at pick_task_fair+0x48/0x148.

pick_task_fair <- pick_eevdf() <- [active tasks]

The root cause is an underflow in reweight_entity():
se->vprot -= avruntime;

But how does it cause failure to pick? All vrpot does at pick is:

/* curr needs to be queued and eligible first. */
if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
curr = NULL;


Because of the precision drift, the min_vruntime jumps ahead of the actual task runtime. This made the task look ineligible, so curr was being set to NULL.


/* vprot only helps for early return below. */
if (curr && protect && protect_slice(curr))
return curr;


Since curr was already NULL from the step above, the code skipped this protection check entirely. It then tried to search the rest of the tree for a new task. But again, as the math was broken it found nothing and returned NULL. That’s why the kernel crashed.


Worst case "curr" runs till it loses eligibility even if vprot wraps
and becomes a large value and moreover, wouldn't vruntime_cmp() in
protect_slice() which compares the signed difference catch it?

--
Thanks and Regards,
Prateek



The vruntime_cmp() should the handle the math, but the drift made the comparison logic invalid. The rounding errors in set_protect_slice()
caused vprot and vruntime to be out of sync, breaking the eligibility
invariants.

Commit bdba3187771c restores the symmetry using 64-bit math which
prevents the NULL return.

Regards,
Shubhang Kaushik