Scheduler wakeup path tuning surface: Use-Cases and Requirements

From: Patrick Bellasi
Date: Tue Jun 23 2020 - 03:29:41 EST



Since last year's OSPM Summit we started conceiving the idea that task
wakeup path could be better tuned for certain classes of workloads
and usage scenarios. Various people showed interest for a possible
tuning interface for the scheduler wakeup path.


.:: The Problem
===============

The discussions we had so far [1] have not been effective in clearly
identifying if a common tuning surface is possible. The last discussion
at this year's OSPM Summit [2,3] was also kind of inconclusive and left
us with the message: start by collecting the requirements and then see
what interface fits them the best.

General consensus is that a unified interface can be challenging and
maybe not feasible. However, generalisation is also a value
and we should strive for it whenever it's possible.

Someone might think that we did not put enough effort in the analysis of
requirements. Perhaps what we missed so far is also a structured and
organised way to collect requirements which also can help in factoring
out the things they have in common.


.:: The Proposal
================

This thread aims at providing a guided template for the description of
different task wakeup use-cases. It does that by setting a series of
questions aimed at precisely describing what's "currently broken", what
we would like to have instead and how we could achieve it.

What we propose here is that, for each wakeup use-case, we start
by replying to this email to provide the required details/comments for
a predefined list of questions. This will generate independent
discussion threads. Each thread will be free to focus on a specific
proposal but still all the thread will be reasoning around a common set
of fundamental concepts.

The hope is that, by representing all the use-cases as sufficiently
detailed responses to a common set of questions, once the discussion
settles down, we can more easily verify if there are common things
surfacing which then can be naturally wrapped into a unified user-space
API.

A first use-case description, following the template guidelines, will
be posted shortly after this message. This also will provide an example
for how to use the template.

NOTE: Whenever required, pseudo-code or simplified C can be used.

I hope this can drive a fruitful discussion in preparation for LPC!

Best,
Patrick


---8<--- For templates submissions: reply only to the following ---8<---


.:: Scheduler Wakeup Path Requirements Collection Template
==========================================================

A) Name: unique one-liner name for the proposed use-case

B) Target behaviour: one paragraph to describe the wakeup path issue

C) Existing control paths: reference to code paths to justify B)

Assuming v5.6 as the reference kernel, this section should provide
links to code paths such as, e.g.

fair.c:3917
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/fair.c?h=v5.6#n3917

Alternatively code snippets can be added inline, e.g.

/*
* The 'current' period is already promised to the current tasks,
* however the extra weight of the new task will slow them down a
* little, place the new task so that it fits in the slot that
* stays open at the end.
*/
if (initial && sched_feat(START_DEBIT))
vruntime += sched_vslice(cfs_rq, se);

NOTE: if the use-case exists only outside the mainline Linux kernel
this section can stay empty

D) Desired behaviour: one paragraph to describe the desired update

NOTE: the current mainline expression is assumed to be correct
for existing use-cases. Thus, here we are looking for run-time
tuning of those existing features.

E) Existing knobs (if any): reference to whatever existing tunable

Some features can already be tuned, but perhaps only via compile time
knobs, SCHED_FEATs or system wide tunable.
If that's the case, we should document them here and explain how they
currently work and what are (if any) the implicit assumptions, e.g.
what is the currently implemented scheduling policy/heuristic.

F) Existing knobs (if any): one paragraph description of the limitations

If the existing knobs are not up to the job for this use-case,
shortly explain here why. It could be because a tuning surface is
already there but it's hardcoded (e.g. compile time knob) or too
coarse grained (e.g. a SCHED_FEAT).

G) Proportionality Analysis: check the nature of the target behavior

Goal here is to verify and discuss if the behaviour (B) has a
proportional nature: different values of the control knobs (E) are
expected to produce different effects for (B).

Special care should be taken to check if the target behaviour has an
intrinsically "binary nature", i.e. only two values make really
sense. In this case it would be very useful to argument why a
generalisation towards a non-binary behaviours does NOT make sense.

H) Range Analysis: identify meaningful ranges

If (G) was successfully, i.e. there is a proportional correlation
between (E) and (B), discuss here about a meaningful range for (E)
and (F).

I) System-Wide tuning: which knobs are required

If required, list new additional tunables here, how they should be
exposed and (if required) which scheduling classes will be affected.

J) Per-Task tuning: which knobs are required

Describe which knobs should be added and which task specific API
(e.g. sched_setscheduler(), prctl(), ...) they should be used.

K) Task-Group tuning: which knobs are required

If the use-case can benefit from a task-group tuning, here it should
**briefly described** how the expected behaviour can be mapped on a
cgroup v2 unified hierarchy.

NOTE: implementation details are not required but we should be able
to hint at which cgroup v2 resource distribution model [5]
should be applied.


---8<--- For templates submissions: exclude the following ---8<---


.:: References
==============

[1] [Discussion v2] Usecases for the per-task latency-nice attribute
Message-ID: 2bd46086-43ff-f130-8720-8eec694eb55b@xxxxxxxxxxxxx
https://lore.kernel.org/lkml/2bd46086-43ff-f130-8720-8eec694eb55b@xxxxxxxxxxxxx

[2] Latency Nice: Implementation and UseCases for Scheduler Optmizations
https://ospm.lwn.net/playback/presentation/2.0/playback.html?meetingId=380dd88f044f67ee4c94d0a2a4fb7c3f46cb6391-1589459486615&t=42m37s

[3] LWN: The many faces of "latency nice"
https://lwn.net/Articles/820659

[4] [PATCH v5 0/4] Introduce per-task latency_nice for scheduler hints
Message-ID: 20200228090755.22829-1-parth@xxxxxxxxxxxxx
https://lore.kernel.org/lkml/20200228090755.22829-1-parth@xxxxxxxxxxxxx

[5] Control Group v2: Resource Distribution Models
https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v2.rst