Re: [RFC PATCH 2/2] sched/fair: skip the cache hot CPU in select_idle_cpu()

From: K Prateek Nayak
Date: Tue Sep 12 2023 - 10:27:10 EST


Hello Chenyu,

On 9/12/2023 6:02 PM, Chen Yu wrote:
> [..snip..]
>
>>> If I understand correctly, WF_SYNC is to let the wakee to woken up
>>> on the waker's CPU, rather than the wakee's previous CPU, because
>>> the waker goes to sleep after wakeup. SIS_CACHE mainly cares about
>>> wakee's previous CPU. We can only restrict that other wakee does not
>>> occupy the previous CPU, but do not enhance the possibility that
>>> wake_affine_idle() chooses the previous CPU.
>>
>> Correct me if I'm wrong here,
>>
>> Say a short sleeper, is always woken up using WF_SYNC flag. When the
>> task is dequeued, we mark the previous CPU where it ran as "cache-hot"
>> and restrict any wakeup happening until the "cache_hot_timeout" is
>> crossed. Let us assume a perfect world where the task wakes up before
>> the "cache_hot_timeout" expires. Logically this CPU was reserved all
>> this while for the short sleeper but since the wakeup bears WF_SYNC
>> flag, the whole reservation is ignored and waker's LLC is explored.
>>
>
> Ah, I see your point. Do you mean, because the waker has a WF_SYNC, wake_affine_idle()
> forces the short sleeping wakee to be woken up on waker's CPU rather the
> wakee's previous CPU, but wakee's previous has been marked as cache hot
> for nothing?

Precisely :)

>
>> Should the timeout be cleared if the wakeup decides to not target the
>> previous CPU? (The default "sysctl_sched_migration_cost" is probably
>> small enough to curb any side effect that could possibly show here but
>> if a genuine use-case warrants setting "sysctl_sched_migration_cost" to
>> a larger value, the wakeup path might be affected where lot of idle
>> targets are overlooked since the CPUs are marked cache-hot forr longer
>> duration)
>>
>> Let me know what you think.
>>
>
> This makes sense. In theory the above logic can be added in
> select_idle_sibling(), if target CPU is chosen rather than
> the previous CPU, the previous CPU's cache hot flag should be
> cleared.
>
> But this might bring overhead. Because we need to grab the rq
> lock and write to other CPU's rq, which could be costly. It
> seems to be a trade-off of current implementation.

I agree, it will not be pretty. Maybe the other way is to have a
history of the type of wakeup the task experiences (similar to
wakee_flips but for sync and non-syn wakeups) and only reserve
the CPU if the task wakes up more via non-sync wakeups? Thinking
out loud here.

> On the other
> hand, if the user sets the sysctl_sched_migration_cost to a quite
> large value:
> 1. Without SIS_CACHE, there is no task migration.

But that is in the load balancing path. I think the wakeup path will
still migrate the task. But I believe there might be very few cases
where all CPUs are marked cache-hot and the SIS_UTIL will not bail
out straight away as a result of high utilization. Probably a rare
scenario.

> 2. With SIS_CACHE enabled, all idle CPUs are cache hot and be skipped
> in select_idle_cpu(), the wakee will be woken up locally.
> It seems to be of the same effect, so there is no much impact
> to wakeup behavior I suppose.
>
> [..snip..]
>

--
Thanks and Regards,
Prateek