Re: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks within one LLC

From: Dietmar Eggemann
Date: Wed May 05 2021 - 08:30:11 EST


On 03/05/2021 13:35, Song Bao Hua (Barry Song) wrote:

[...]

>> From: Song Bao Hua (Barry Song)

[...]

>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx]

[...]

>>> On 29/04/2021 00:41, Song Bao Hua (Barry Song) wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx]
>>>
>>> [...]
>>>
>>>>>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx]
>>>>>
>>>>> [...]
>>>>>
>>>>>>>> On 20/04/2021 02:18, Barry Song wrote:

[...]

>
> On the other hand, according to "sched: Implement smarter wake-affine logic"
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62470419
>
> Proper factor in wake_wide is mainly beneficial of 1:n tasks like postgresql/pgbench.
> So using the smaller cluster size as factor might help make wake_affine false so
> improve pgbench.
>
> From the commit log, while clients = 2*cpus, the commit made the biggest
> improvement. In my case, It should be clients=48 for a machine whose LLC
> size is 24.
>
> In Linux, I created a 240MB database and ran "pgbench -c 48 -S -T 20 pgbench"
> under two different scenarios:
> 1. page cache always hit, so no real I/O for database read
> 2. echo 3 > /proc/sys/vm/drop_caches
>
> For case 1, using cluster_size and using llc_size will result in similar
> tps= ~108000, all of 24 cpus have 100% cpu utilization.
>
> For case 2, using llc_size still shows better performance.
>
> tps for each test round(cluster size as factor in wake_wide):
> 1398.450887 1275.020401 1632.542437 1412.241627 1611.095692 1381.354294 1539.877146
> avg tps = 1464
>
> tps for each test round(llc size as factor in wake_wide):
> 1718.402983 1443.169823 1502.353823 1607.415861 1597.396924 1745.651814 1876.802168
> avg tps = 1641 (+12%)
>
> so it seems using cluster_size as factor in "slave >= factor && master >= slave *
> factor" isn't a good choice for my machine at least.

So SD size = 4 (instead of 24) seems to be too small for `-c 48`.

Just curious, have you seen the benefit of using wake wide on SD size =
24 (LLC) compared to not using it at all?