Re: [PATCH v2 4/4] sched/fair: Prevent active LB from preempting higher sched classes

From: Valentin Schneider
Date: Fri Aug 30 2019 - 11:44:12 EST


On 29/08/2019 15:19, Vincent Guittot wrote:
[...]
>> Right, if we end up kicking the cpu_stopper this can still happen (since
>> we drop the lock). Thing is, you can't detect it on the cpu_stopper side,
>> since the currently running is obviously not going to be CFS (and it's
>> too late anyway, we already preempted whatever was running there). Though
>> I should probably change the name of the patch to reflect that it's not a
>> 100% cure.
>>
>> I tweaked the nr_running check of the cpu_stop callback in patch 3/4 to try
>> to bail out early, but AFAICT that's the best we can do without big changes
>> elsewhere.
>>
>> If we wanted to prevent those preemptions at all cost, I suppose we'd want
>
> I'm not sure that it's worth the effort and the complexity
>

My point exactly :)

[...]
>> I had this initially but convinced myself out of it: since we hold no
>> lock in need_active_balance(), the information we get on the current task
>> (and, arguably, on the h_nr_running) is too volatile to be of any use.
>
> But since the lock is released anyway, everything will always be too
> volatile in this case.

We do release the lock if we go kick the cpu_stopper, but can nevertheless
make a decision with the most up to date information. I'd say it's for
similar reasons that we check busiest->curr->cpus_ptr right before
kicking the cpu_stopper rather than in need_active_balance().

The majority of the checks in need_active_balance() (all but one) depend
on env/sd stats which aren't volatile.

>>
>> I do believe those checks have their place in active_load_balance()'s
>> critical section, as that's the most accurate we're going to get. On the
>> plus side, if we *do* detect the remote rq's current task isn't CFS, we
>> can run detach_one_task() locally, which is an improvement IMO.
>
> This add complexity in the code by adding another path to detach attach task(s).

Note that it's not a new detach/attach per se, rather it's about doing it
in active_load_balance() rather than active_load_balance_cpu_stop() in some
cases.

> We could simply bail out and wait the next load balance (which is
> already the case sometime) or if you really want to detach a task jump
> back to more_balance
>

A simple bail-out is what I had in v1, but following Qais' comments I
figured I could add the detach_one_tasks().

Jumping back to more_balance is quite different than doing a detach_one_task().