Re: [PATCH v3] lib: optimize cpumask_local_spread()
From: Shaokun Zhang
Date: Tue Nov 12 2019 - 21:46:22 EST
Hi Michal,
On 2019/11/12 19:56, Michal Hocko wrote:
> On Mon 11-11-19 10:02:37, Shaokun Zhang wrote:
>> Hi Michal,
>>
>> On 2019/11/8 18:31, Michal Hocko wrote:
>>> This changelog looks better, thanks! I still have some questions though.
>>> Btw. cpumask_local_spread is used by the networking code but I do not
>>> see net guys involved (Cc netdev)
>>
>> Oh, I forgot to involve the net guys, sorry.
>>
>>>
>>> On Thu 07-11-19 09:44:08, Shaokun Zhang wrote:
>>>> From: yuqi jin <jinyuqi@xxxxxxxxxx>
>>>>
>>>> In the multi-processors and NUMA system, I/O driver will find cpu cores
>>>> that which shall be bound IRQ. When cpu cores in the local numa have
>>>> been used, it is better to find the node closest to the local numa node,
>>>> instead of choosing any online cpu immediately.
>>>>
>>>> On Huawei Kunpeng 920 server, there are 4 NUMA node(0 -3) in the 2-cpu
>>>> system(0 - 1).
>>>
>>> Please send a topology of this server (numactl -H).
>>>
>>
>> available: 4 nodes (0-3)
>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
>> node 0 size: 63379 MB
>> node 0 free: 61899 MB
>> node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
>> node 1 size: 64509 MB
>> node 1 free: 63942 MB
>> node 2 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
>> node 2 size: 64509 MB
>> node 2 free: 63056 MB
>> node 3 cpus: 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
>> node 3 size: 63997 MB
>> node 3 free: 63420 MB
>> node distances:
>> node 0 1 2 3
>> 0: 10 16 32 33
>> 1: 16 10 25 32
>> 2: 32 25 10 16
>> 3: 33 32 16 10
>>
>>>> We perform PS (parameter server) business test, the
>>>> behavior of the service is that the client initiates a request through
>>>> the network card, the server responds to the request after calculation.
>>>
>>> Is the benchmark any ublicly available?
>>>
>>
>> Sorry, the PS which we test is not open, but I think redis is the same as PS
>> on the macro level. When there are both 24 redis servers on node2 and node3.
>> if the 24-47 irqs and xps of NIC are not bound to node3, the redis servers
>> on node3 will not performance good.
>
> Are there any other benchmarks showing improvements?
>
Sorry, I don't have it. The issue is clear and the patch is helpful for the
actual Parameter Server and Redis test.
>>>> When two PS processes run on node2 and node3 separately and the
>>>> network card is located on 'node2' which is in cpu1, the performance
>>>> of node2 (26W QPS) and node3 (22W QPS) was different.
>>>> It is better that the NIC queues are bound to the cpu1 cores in turn,
>>>> then XPS will also be properly initialized, while cpumask_local_spread
>>>> only considers the local node. When the number of NIC queues exceeds
>>>> the number of cores in the local node, it returns to the online core
>>>> directly. So when PS runs on node3 sending a calculated request,
>>>> the performance is not as good as the node2. It is considered that
>>>> the NIC and other I/O devices shall initialize the interrupt binding,
>>>> if the cores of the local node are used up, it is reasonable to return
>>>> the node closest to it.
>>>
>>> Can you post cpu affinities before and after this patch?
>>>
>>
>> Before this patch
>> Euler:/sys/bus/pci/devices/0000:7d:00.2 # cat numa_node
>> 2
>> Euler:~ # cat /proc/irq/345/smp_affinity #IRQ0
>> 00000000,00010000,00000000
>
> This representation is awkward to parse. Could you add smp_affinity_list
> please? It would save quite some head scratching.
>
before patch
Euler:/sys/bus/pci/devices/0000:7d:00.2 # cat numa_node
2
Euler:/sys/bus/pci # cat /proc/irq/345/smp_affinity_list
48
Euler:/sys/bus/pci # cat /proc/irq/369/smp_affinity_list
0
Euler:/sys/bus/pci # cat /proc/irq/393/smp_affinity_list
24
Euler:/sys/bus/pci #
after patch
Euler:/sys/bus/pci/devices/0000:7d:00.2 # cat numa_node
2
Euler:/sys/bus/pci # cat /proc/irq/345/smp_affinity_list
48
Euler:/sys/bus/pci # cat /proc/irq/369/smp_affinity_list
72
Euler:/sys/bus/pci # cat /proc/irq/393/smp_affinity_list
24
Euler:/sys/bus/pci #
Thanks,
Shaokun