Re: [RFC PATCH] sched/fair: Make tg->load_avg per node

From: Dietmar Eggemann
Date: Fri Mar 31 2023 - 11:48:56 EST


On 31/03/2023 06:06, Aaron Lu wrote:
> Hi Daniel,
>
> Thanks for taking a look.
>
> On Thu, Mar 30, 2023 at 03:51:57PM -0400, Daniel Jordan wrote:
>> On Thu, Mar 30, 2023 at 01:46:02PM -0400, Daniel Jordan wrote:
>>> Hi Aaron,
>>>
>>> On Wed, Mar 29, 2023 at 09:54:55PM +0800, Aaron Lu wrote:
>>>> On Wed, Mar 29, 2023 at 02:36:44PM +0200, Dietmar Eggemann wrote:
>>>>> On 28/03/2023 14:56, Aaron Lu wrote:
>>>>>> On Tue, Mar 28, 2023 at 02:09:39PM +0200, Dietmar Eggemann wrote:
>>>>>>> On 27/03/2023 07:39, Aaron Lu wrote:

[...]

>>> AMD EPYC 7J13 64-Core Processor
>>> 2 sockets * 64 cores * 2 threads = 256 CPUs
>
> I have a vague memory AMD machine has a smaller LLC and cpus belonging
> to the same LLC is also not many, 8-16?
>
> I tend to think cpu number of LLC play a role here since that's the
> domain where idle cpu is searched on task wake up time.
>
>>>
>>> sysbench: nr_threads=256
>>>
>>> All observability data was taken at one minute in and using one tool at
>>> a time.
>>>
>>> @migrations[1]: 1113
>>> @migrations[0]: 6152
>>> @wakeups[1]: 8871744
>>> @wakeups[0]: 9773321

Just a thought: Could the different behaviour come from different
CPU numbering schemes (consecutive/alternate)?

(1) My Arm server:

numactl -H
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 2 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
node 3 cpus: 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95


(2) Intel(R) Xeon(R) Silver 4314

$ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63

[...]