Re: [Discussion v2] Usecases for the per-task latency-nice attribute

From: Parth Shah
Date: Mon Oct 07 2019 - 04:46:49 EST




On 10/2/19 9:41 PM, David Laight wrote:
> From: Parth Shah
>> Sent: 30 September 2019 11:44
> ...
>> 5> Separating AVX512 tasks and latency sensitive tasks on separate cores
>> ( -Tim Chen )
>> ===========================================================================
>> Another usecase we are considering is to segregate those workload that will
>> pull down core cpu frequency (e.g. AVX512) from workload that are latency
>> sensitive. There are certain tasks that need to provide a fast response
>> time (latency sensitive) and they are best scheduled on cpu that has a
>> lighter load and not have other tasks running on the sibling cpu that could
>> pull down the cpu core frequency.
>>
>> Some users are running machine learning batch tasks with AVX512, and have
>> observed that these tasks affect the tasks needing a fast response. They
>> have to rely on manual CPU affinity to separate these tasks. With
>> appropriate latency hint on task, the scheduler can be taught to separate them.
>
> Has this been diagnosed properly?
> I can't really see how the frequency drop from AVX512 significantly affects latency.
> Most tasks that require low latency probably don't do a lot of work.
> It is much more likely that the latency issues happen because the AVX512 tasks
> are doing very few system calls so can't be pre-empted even by a high priority task.> This 'feature' is hinted by this:
>> 2> TurboSched
>> ( -Parth Shah )
>> ====================
>> TurboSched [2] tries to minimize the number of active cores in a socket by
>> packing an un-important and low-utilization (named jitter) task on an
>> already active core and thus refrains from waking up of a new core if
>> possible.
>

You are correct as both approach contradict each other in some sense.
But what TurboSched tried to achieve is doing task packing only for the
tasks classified by user as *latency in-sensitive*. Whereas, IIUC, what Tim
proposes here is to not pack *latency sensitive* tasks and I guess that
align with the TurboSched approach as well, isn't it?

Probably @Tim can throw some light on this for better clarification?

> Consider this example of a process that requires low latency (sub 1ms would be good):
> - A hardware interrupt (or timer interrupt) wakes up on thread.
> - When that thread wakes it wakes up other threads that are sleeping.
> - All the threads 'beaver away' for a few ms (processing RTP and other audio).
> - They all sleep for the rest of a 10ms period.
>
> The affinities are set so each thread runs on a separate cpu, and all are SCHED_RR.
> Now loop all the cpus in userspace (run: while :; do :; done) and see what happens to the latencies.
> You really want the SCHED_RR threads to immediately pre-empt the running processes.
> But I suspect nothing happens until a timer interrupt to the target cpu.
>

This is a good corner case where scheduler can be optimized further, and
the per-task attribute like the latency-nice can be of some help. Maybe we
can reduce the vslice of a task not having any latency constraints in the
time when any RR/RT tasks are present.

> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>