Re: [RESEND PATCH v3 0/7] Improve scheduler scalability for fast path

From: Subhra Mazumdar
Date: Tue Jul 02 2019 - 23:54:41 EST



On 7/2/19 1:54 AM, Patrick Bellasi wrote:
Wondering if searching and preempting needs will ever be conflicting?
I guess the winning point is that we don't commit behaviors to
userspace, but just abstract concepts which are turned into biases.

I don't see conflicts right now: if you are latency tolerant that
means you can spend more time to try finding a better CPU (e.g. we can
use the energy model to compare multiple CPUs) _and/or_ give the
current task a better chance to complete by delaying its preemption.
OK

Otherwise sounds like a good direction to me. For the searching aspect, can
we map latency nice values to the % of cores we search in select_idle_cpu?
Thus the search cost can be controlled by latency nice value.
I guess that's worth a try, only caveat I see is that it's turning the
bias into something very platform specific. Meaning, the same
latency-nice value on different machines can have very different
results.

Would not be better to try finding a more platform independent mapping?

Maybe something time bounded, e.g. the higher the latency-nice the more
time we can spend looking for CPUs?
The issue I see is suppose we have a range of latency-nice values, then it
should cover the entire range of search (one core to all cores). As Peter
said some workloads will want to search the LLC fully. If we have absolute
time, the map of latency-nice values range to them will be arbitrary. If
you have something in mind let me know, may be I am thinking differently.

But the issue is if more latency tolerant workloads set to less
search, we still need some mechanism to achieve good spread of
threads.
I don't get this example: why more latency tolerant workloads should
require less search?
I guess I got the definition of "latency tolerant" backwards.

Can we keep the sliding window mechanism in that case?
Which one? Sorry did not went through the patches, can you briefly
resume the idea?
If a workload has set it to low latency tolerant, then the search will be
less. That can lead to localization of threads on a few CPUs as we are not
searching the entire LLC even if there are idle CPUs available. For this
I had introduced a per-CPU variable (for the target CPU) to track the
boundary of search so that every time it will start from the boundary, thus
sliding the window. So even if we are searching very little the search
window keeps shifting and gives us a good spread. This is orthogonal to the
latency-nice thing.

Also will latency nice do anything for select_idle_core and
select_idle_smt?
I guess principle the same bias can be used at different levels, maybe
with different mappings.
Doing it for select_idle_core will have the issue that the dynamic flag
(whether an idle core is present or not) can only be updated by threads
which are doing the full search.

Thanks,
Subhra

In the mobile world use-case we will likely use it only to switch from
select_idle_sibling to the energy aware slow path. And perhaps to see
if we can bias the wakeup preemption granularity.

Best,
Patrick