Re: [PATCH 1/1] sched/fair: Fix invalid pointer dereference in child_cfs_rq_on_list()

From: Dietmar Eggemann
Date: Thu Mar 06 2025 - 05:10:36 EST


On 06/03/2025 05:48, Aboorva Devarajan wrote:
> On Wed, 2025-03-05 at 10:23 +0100, Dietmar Eggemann wrote:
>> On 05/03/2025 09:21, Vincent Guittot wrote:
>>> On Tue, 4 Mar 2025 at 18:00, Aboorva Devarajan <aboorvad@xxxxxxxxxxxxx> wrote:

[...]

>>>> A recent attempt to reorder fields in struct rq exposed this issue by
>>>> modifying memory offsets and affecting how pointer computations are
>>>> resolved. While the problem existed before, it was previously masked by
>>>> specific field arrangement. The reordering caused erroneous pointer
>>>> accesses, leading to a NULL dereference and a crash, as seen in the
>>
>> I'm running tip/sched/core on arm64 and I still only see the wrong
>> pointer for 'prev_cfs_rq->tg->parent' in the 'prev ==
>> &rq->leaf_cfs_rq_list' case?
>>
>> ...
>> cpu=5 prev_cfs_rq->tg=ffff00097efb63a0 parent=0000000000000010
>> cfs_rq->tg=ffff000802084000
>> ...
>>
>
> Hi Dietmar,
>
> Yes, you are right, I meant that we will still have invalid pointers and use it
> silently in the vanilla kernel, but it won't always lead to a crash.
>
> The crash in this specific case happens if `prev_cfs_rq->tg` points to a memory
> location that cannot be de-referenced. Otherwise, the function de-references and
> uses memory locations that are not valid but did not cause a visible failure so far.
>
> Here are more details on what I meant by reordering the runqueue:
>
> With the system and kernel configuration, I encountered the crash while trying
> to reorder the runqueue structure, here is the minimal change that caused the
> crash on top of v6.14-rc5 kernel:

Ah, OK. You changed the code locally. Somehow I thought you referred to
a change which is already in mainline (or tip/sched/core) and I was
wondering which one it would be.

[...]

> But looks like a patch similar to this is merged yesterday [1], so this can
> be ignored :)
>
>
> [1] https://lore.kernel.org/all/174119292742.14745.16827644501260146974.tip-bot2@tip-bot2/

Ah, OK, same idea though.

BTW, the:

} else {
prev = rq->tmp_alone_branch;
}

path is taken when dealing with CONFIG_CFS_BANDWIDTH and throttling
scenarios so this is important to cover as well.