On 10/07/2020 01:08, chris hyser wrote:
[...]
D) Desired behavior:
Reduce the maximum wake-up latency of designated CFS tasks by skipping
some or all of the idle CPU and core searches by setting a maximum idle
CPU search value (maximum loop iterations).
Searching 'ALL' as the maximum would be the default and implies the
current code path which may or may not search up to ALL. Searching 0
would result in the least latency (shown with experimental results to be
included if/when patchset goes up). One of the considerations is that
the maximum length of the search is a function of the size of the LLC
scheduling domain and this is platform dependent. Whether 'some', i.e. a
numerical value limiting the search can be used to "normalize" this
latency across differing scheduling domain sizes is under investigation.
Clearly differing hardware will have many other significant differences,
but in different sized and dynamically sized VMs running on fleets of
common HW this may be interesting.
I assume that this task-specific feature could coexists in
select_idle_core() and select_idle_cpu() with the already existing
runtime heuristics (test_idle_cores() and the two sched features
mentioned under E/F) to reduce the idle CPU search space on a busy system.
E/F) Existing knobs (and limitations):
There are existing sched_feat: SIS_AVG_CPU, SIS_PROP that attempt to
short circuit the idle cpu search path in select_idle_cpu() based on
estimations of the current costs of searching. Neither provides a means
[...]
H) Range Analysis:
The knob is a positive integer representing "max number of CPUs to
search". The default would be 'ALL' which could be translated as
INT_MAX. '0 searches' translates to 0. Other values represent a max
limit on the search, in this case iterations of a for loop.
IMHO the opposite use case for this feature (favour high throughput over
short wakeup latency (Facebook) is already cured by the changes
introduced by commit 10e2f1acd010 ("sched/core: Rewrite and improve
select_idle_siblings()"), i.e. with the current implementation of sis().
It seems that they don't need an additional per-task feature on top of
the default system-wide runtime heuristics.