Re: Usecases for the per-task latency-nice attribute

From: Parth Shah
Date: Thu Sep 19 2019 - 03:02:24 EST




On 9/18/19 7:48 PM, Patrick Bellasi wrote:
>
> On Wed, Sep 18, 2019 at 13:41:04 +0100, Parth Shah wrote...
>
>> Hello everyone,
>
> Hi Parth,
> thanks for staring this discussion.
>
> [ + patrick.bellasi@xxxxxxxxxx ] my new email address, since with
> @arm.com I will not be reachable anymore starting next week.
>

Noted. I will send new version with the summary of all the discussion and
add more people to CC. Will change your mail in that, thanks for notifying me.

>> As per the discussion in LPC2019, new per-task property like latency-nice
>> can be useful in certain scenarios. The scheduler can take proper decision
>> by knowing latency requirement of a task from the end-user itself.
>>
>> There has already been an effort from Subhra for introducing Task
>> latency-nice [1] values and have seen several possibilities where this type of
>> interface can be used.
>>
>> From the best of my understanding of the discussion on the mail thread and
>> in the LPC2019, it seems that there are two dilemmas;
>>
>> 1. Name: What should be the name for such attr for all the possible usecases?
>> =============
>> Latency nice is the proposed name as of now where the lower value indicates
>> that the task doesn't care much for the latency
>
> If by "lower value" you mean -19 (in the proposed [-20,19] range), then
> I think the meaning should be the opposite.
>

Oops, my bad. i wanted to tell higher value but somehow missed that
latency-nice should be the opposite to the latency sensitivity.

But in the further scope of the discussion, I mean -19 to be the least
value (latency sensitive) and +20 to be the greatest value(does not care
for latency) if range is [-19,20]

> A -19 latency-nice task is a task which is not willing to give up
> latency. For those tasks for example we want to reduce the wake-up
> latency at maximum.
>
> This will keep its semantic aligned to that of process niceness values
> which range from -20 (most favourable to the process) to 19 (least
> favourable to the process).

Totally agreed upon.

>
>> and we can spend some more time in the kernel to decide a better
>> placement of a task (to save time, energy, etc.)
>
> Tasks with an high latency-nice value (e.g. 19) are "less sensible to
> latency". These are tasks we wanna optimize mainly for throughput and
> thus, for example, we can spend some more time to find out a better task
> placement at wakeup time.
>
> Does that makes sense?

Correct. Task placement is one way to optimize which can benefit to both
the server and embedded world by saving power without compromising much on
performance.

>
>> But there seems to be a bit of confusion on whether we want biasing as well
>> (latency-biased) or something similar, in which case "latency-nice" may
>> confuse the end-user.
>
> AFAIU PeterZ point was "just" that if we call it "-nice" it has to
> behave as "nice values" to avoid confusions to users. But, if we come up
> with a different naming maybe we will have more freedom.
>
> Personally, I like both "latency-nice" or "latency-tolerant", where:
>
> - latency-nice:
> should have a better understanding based on pre-existing concepts
>
> - latency-tolerant:
> decouples a bit its meaning from the niceness thus giving maybe a bit
> more freedom in its complete definition and perhaps avoid any
> possible interpretation confusion like the one I commented above.
>
> Fun fact: there was also the latency-nasty proposal from PaulMK :)
>

Cool. In that sense, latency-tolerant seems to be more flexible covering
multiple functionality that a scheduler can provide with such userspace hints.


>> 2. Value: What should be the range of possible values supported by this new
>> attr?
>> ==============
>> The possible values of such task attribute still need community attention.
>> Do we need a range of values or just binary/ternary values are sufficient?
>> Also signed or unsigned and so the length of the variable (u64, s32,
>> etc)?
>
> AFAIR, the proposal on the table are essentially two:
>
> A) use a [-20,19] range
>
> Which has similarities with the niceness concept and gives a minimal
> continuous range. This can be on hand for things like scaling the
> vruntime normalization [3]
>
> B) use some sort of "profile tagging"
> e.g. background, latency-sensible, etc...
>
> If I correctly got what PaulT was proposing toward the end of the
> discussion at LPC.
>

If I got it right, then for option B, we can have this attr to be used as a
latency_flag just like per-process flags (e.g. PF_IDLE). If so, then we can
piggyback on the p->flags itself, hence I will prefer the range unless we
have multiple usecases which can not get best out of the range.

> This last option deserves better exploration.
>
> At first glance I'm more for option A, I see a range as something that:
>
> - gives us a bit of flexibility in terms of the possible internal
> usages of the actual value
>
> - better supports some kind of linear/proportional mapping
>
> - still supports a "profile tagging" by (possible) exposing to
> user-space some kind of system wide knobs defining threshold that
> maps the continuous value into a "profile"
> e.g. latency-nice >= 15: use SCHED_BATCH
>

+1, good listing to support range for latency-<whatever>

> In the following discussion I'll call "threshold based profiling"
> this approach.
>
>
>> This mail is to initiate the discussion regarding the possible usecases of
>> such per task attribute and to come up with a specific name and value for
>> the same.
>>
>> Hopefully, interested one should plot out their usecase for which this new
>> attr can potentially help in solving or optimizing it.
>
> +1
>
>> Well, to start with, here is my usecase.
>>
>> -------------------
>> **Usecases**
>> -------------------
>>
>> $> TurboSched
>> ====================
>> TurboSched [2] tries to minimize the number of active cores in a socket by
>> packing an un-important and low-utilization (named jitter) task on an
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> We should really come up with a different name, since jitters clashes
> with other RT related concepts.
>

I agree, based on LPC discussion and comments from tglx, I am happy to
rename it to whatever feels functionally correct and non-confusing to end-user.

> Maybe we don't even need a name at all, the other two attributes you
> specify are good enough to identify those tasks: they are just "small
> background" tasks.
>
> small : because on their small util_est value
> background : because of their high latency-nice value
>

Correct. If we have latency-nice hints + utilization then we can classify
those tasks for task packing.

>> already active core and thus refrains from waking up of a new core if
>> possible. This requires tagging of tasks from the userspace hinting which
>> tasks are un-important and thus waking-up a new core to minimize the
>> latency is un-necessary for such tasks.
>> As per the discussion on the posted RFC, it will be appropriate to use the
>> task latency property where a task with the highest latency-nice value can
>> be packed.
>
> We should better defined here what you mean with "highest" latency-nice
> value, do you really mean the top of the range, e.g. 19?
>

yes, I mean +19 (or +20 whichever is higher) here which does not care for
latency.

> Or...
>
>> But for this specific use-cases, having just a binary value to know which
>> task is latency-sensitive and which not is sufficient enough, but having a
>> range is also a good way to go where above some threshold the task can be
>> packed.
>
> ... yes, maybe we can reason about a "threshold based profiling" where
> something like for example:
>
> /proc/sys/kernel/sched_packing_util_max : 200
> /proc/sys/kernel/sched_packing_latency_min : 17
>
> means that a task with latency-nice >= 17 and util_est <= 200 will be packed?
>

yes, something like that.

>
> $> Wakeup path tunings
> ==========================
>
> Some additional possible use-cases was already discussed in [3]:
>
> 1. dynamically tune the policy of a task among SCHED_{OTHER,BATCH,IDLE}
> depending on crossing certain pre-configured threshold of latency
> niceness.
>
> 2. dynamically bias the vruntime updates we do in place_entity()
> depending on the actual latency niceness of a task.
>
> PeterZ thinks this is dangerous but that we can "(carefully) fumble a
> bit there."
>
> 3. bias the decisions we take in check_preempt_tick() still depending
> on a relative comparison of the current and wakeup task latency
> niceness values.
>

Nice. Thanks for listing out the usecases.

I guess latency_flags will be difficult to use for usecase 2 and 3, but
range will work for all the three usecases.

>> References:
>> ===========
>> [1]. https://lkml.org/lkml/2019/8/30/829
>> [2]. https://lkml.org/lkml/2019/7/25/296
>
> [3]. Message-ID: <20190905114709.GM2349@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
> https://lore.kernel.org/lkml/20190905114709.GM2349@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>
>
> Best,
> Patrick
>

Thanks,
Parth