Re: [PATCH RESEND] sched/fair: Fix wrong cpu selecting from isolated domain

From: Xunlei Pang
Date: Thu Sep 24 2020 - 04:54:48 EST


On 9/24/20 3:18 PM, Vincent Guittot wrote:
> On Thu, 24 Sep 2020 at 08:48, Xunlei Pang <xlpang@xxxxxxxxxxxxxxxxx> wrote:
>>
>> We've met problems that occasionally tasks with full cpumask
>> (e.g. by putting it into a cpuset or setting to full affinity)
>> were migrated to our isolated cpus in production environment.
>>
>> After some analysis, we found that it is due to the current
>> select_idle_smt() not considering the sched_domain mask.
>>
>> Steps to reproduce on my 31-CPU hyperthreads machine:
>> 1. with boot parameter: "isolcpus=domain,2-31"
>> (thread lists: 0,16 and 1,17)
>> 2. cgcreate -g cpu:test; cgexec -g cpu:test "test_threads"
>> 3. some threads will be migrated to the isolated cpu16~17.
>>
>> Fix it by checking the valid domain mask in select_idle_smt().
>>
>> Fixes: 10e2f1acd010 ("sched/core: Rewrite and improve select_idle_siblings())
>> Reported-by: Wetp Zhang <wetp.zy@xxxxxxxxxxxxxxxxx>
>> Reviewed-by: Jiang Biao <benbjiang@xxxxxxxxxxx>
>> Signed-off-by: Xunlei Pang <xlpang@xxxxxxxxxxxxxxxxx>
>
> Reviewed-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
>

Thanks, Vincent :-)