Re: [PATCH] sched: fix migration to invalid cpu in __set_cpus_allowed_ptr

From: Valentin Schneider
Date: Tue Sep 24 2019 - 12:12:23 EST


On 24/09/2019 15:09, Dietmar Eggemann wrote:
> On 9/23/19 6:06 PM, Valentin Schneider wrote:
>> On 23/09/2019 16:43, Dietmar Eggemann wrote:
>>> I'm not sure that CONFIG_DEBUG_PER_CPU_MAPS=y will help you here.
>>>
>>> __set_cpus_allowed_ptr(...)
>>> {
>>> ...
>>> dest_cpu = cpumask_any_and(...)
>>> ...
>>> }
>>>
>>> With:
>>>
>>> #define cpumask_any_and(mask1, mask2) cpumask_first_and((mask1), (mask2))
>>> #define cpumask_first_and(src1p, src2p) cpumask_next_and(-1, (src1p),
>>> (src2p))
>>>
>>> cpumask_next_and() is called with n = -1 and in this case does not
>>> invoke cpumask_check().
>>>
>>
>> It won't warn here because it's still a valid return value, but it should
>> warn in the cpumask_test_cpu() that follows (in is_cpu_allowed()) because
>> it would be passed a value >= nr_cpu_ids. So at the very least this config
>> does catch cpumask_any*() return values being blindly passed to
>> cpumask_test_cpu().
>
> OK, I see and agree.
>
> But IMHO, we still don't call cpumask_test_cpu(dest_cpu, ...), right.
>
> What the patch fixes is that it closes the window between two reads of
> cpu_active_mask in which cpuhp can potentially punch a hole into the
> cpu_active_mask.
>
> If p is not running or queued and it's state is unequal to TASK_WAKING,
> a 'dest_cpu == nr_cpu_ids' goes unnoticed.

In this case we don't need to force it off to another CPU, since that will
get sorted out at its next wakeup. However, the patch still catches that
, since it does an early

if (dest_cpu >= nr_cpu_ids) {
ret = -EINVAL;
goto out;

and that's regardless of the task's state.

> Otherwise we see an 'unable
> to handle kernel paging request' or 'unable to handle page fault for
> address' bug in migration_cpu_stop() or move_queued_task().
>
> Do I miss something?
>
> [...]
>