Re: [PATCH v2 6/8] sched/idle: Move busy_cpu accounting to idle callback

From: Aubrey Li
Date: Fri May 14 2021 - 00:12:37 EST


On 5/13/21 3:31 PM, Srikar Dronamraju wrote:
> * Aubrey Li <aubrey.li@xxxxxxxxxxxxxxx> [2021-05-12 16:08:24]:
>
>> On 5/7/21 12:45 AM, Srikar Dronamraju wrote:
>>> Currently we account nr_busy_cpus in no_hz idle functions.
>>> There is no reason why nr_busy_cpus should updated be in NO_HZ_COMMON
>>> configs only. Also scheduler can mark a CPU as non-busy as soon as an
>>> idle class task starts to run. Scheduler can then mark a CPU as busy
>>> as soon as its woken up from idle or a new task is placed on it's
>>> runqueue.
>>
>> IIRC, we discussed this before, if a SCHED_IDLE task is placed on the
>> CPU's runqueue, this CPU should be still taken as a wakeup target.
>>
>
> Yes, this CPU is still a wakeup target, its only when this CPU is busy, that
> we look at other CPUs
>
>> Also, for those frequent context-switching tasks with very short idle,
>> it's expensive for scheduler to mark idle/busy every time, that's why
>> my patch only marks idle every time and marks busy ratelimited in
>> scheduler tick.
>>
>
> I have tried few tasks with very short idle times and updating nr_busy
> everytime, doesnt seem to be impacting. Infact, it seems to help in picking
> the idler-llc more often.
>

How many CPUs in your LLC?

This is a system with 192 CPUs, 4 nodes and each node has 48 CPUs in LLC
domain.

It looks like for netperf both TCP and UDP cases have the notable change
under 2 x overcommit, it may be not interesting though.


hackbench(48 tasks per group)
=========
case load baseline(std%) compare%( std%)
process-pipe group-1 1.00 ( 6.74) -4.61 ( 8.97)
process-pipe group-2 1.00 ( 36.84) +11.53 ( 26.35)
process-pipe group-3 1.00 ( 24.97) +12.21 ( 19.05)
process-pipe group-4 1.00 ( 18.27) -2.62 ( 17.60)
process-pipe group-8 1.00 ( 4.33) -2.22 ( 3.08)
process-sockets group-1 1.00 ( 7.88) -20.26 ( 15.97)
process-sockets group-2 1.00 ( 5.38) -19.41 ( 9.25)
process-sockets group-3 1.00 ( 4.22) -5.70 ( 3.00)
process-sockets group-4 1.00 ( 1.44) -1.80 ( 0.79)
process-sockets group-8 1.00 ( 0.44) -2.86 ( 0.06)
threads-pipe group-1 1.00 ( 5.43) -3.69 ( 3.59)
threads-pipe group-2 1.00 ( 18.00) -2.69 ( 16.79)
threads-pipe group-3 1.00 ( 21.72) -9.01 ( 21.34)
threads-pipe group-4 1.00 ( 21.58) -6.43 ( 16.26)
threads-pipe group-8 1.00 ( 3.05) -0.15 ( 2.31)
threads-sockets group-1 1.00 ( 14.51) -5.35 ( 13.85)
threads-sockets group-2 1.00 ( 3.97) -24.15 ( 4.40)
threads-sockets group-3 1.00 ( 4.97) -9.05 ( 2.46)
threads-sockets group-4 1.00 ( 1.98) -3.44 ( 0.49)
threads-sockets group-8 1.00 ( 0.37) -2.13 ( 0.20)

netperf
=======
case load baseline(std%) compare%( std%)
TCP_RR thread-48 1.00 ( 3.84) -2.20 ( 3.83)
TCP_RR thread-96 1.00 ( 5.22) -4.97 ( 3.90)
TCP_RR thread-144 1.00 ( 7.97) -0.75 ( 4.39)
TCP_RR thread-192 1.00 ( 3.03) -0.67 ( 4.40)
TCP_RR thread-384 1.00 ( 22.27) -14.15 ( 36.28)
UDP_RR thread-48 1.00 ( 2.08) -0.39 ( 2.29)
UDP_RR thread-96 1.00 ( 2.48) -4.26 ( 16.06)
UDP_RR thread-144 1.00 ( 49.50) -3.28 ( 34.86)
UDP_RR thread-192 1.00 ( 6.39) +8.07 ( 88.15)
UDP_RR thread-384 1.00 ( 31.54) -12.76 ( 35.98)