Re: [FIX PATCH 2/2] mm/page_alloc: Use accumulated load when building node fallback list

From: Bharata B Rao
Date: Fri Sep 03 2021 - 00:44:24 EST



On 8/30/2021 5:46 PM, Bharata B Rao wrote:
> From: Krupa Ramakrishnan <krupa.ramakrishnan@xxxxxxx>
>
> In build_zonelists(), when the fallback list is built for the nodes,
> the node load gets reinitialized during each iteration. This results
> in nodes with same distances occupying the same slot in different
> node fallback lists rather than appearing in the intended round-
> robin manner. This results in one node getting picked for allocation
> more compared to other nodes with the same distance.
>
> As an example, consider a 4 node system with the following distance
> matrix.
>
> Node 0 1 2 3
> ----------------
> 0 10 12 32 32
> 1 12 10 32 32
> 2 32 32 10 12
> 3 32 32 12 10
>
> For this case, the node fallback list gets built like this:
>
> Node Fallback list
> ---------------------
> 0 0 1 2 3
> 1 1 0 3 2
> 2 2 3 0 1
> 3 3 2 0 1 <-- Unexpected fallback order

FWIW, for a dual-socket 8 node system with the following distance matrix,

node 0 1 2 3 4 5 6 7
0: 10 12 12 12 32 32 32 32
1: 12 10 12 12 32 32 32 32
2: 12 12 10 12 32 32 32 32
3: 12 12 12 10 32 32 32 32
4: 32 32 32 32 10 12 12 12
5: 32 32 32 32 12 10 12 12
6: 32 32 32 32 12 12 10 12
7: 32 32 32 32 12 12 12 10

the fallback list looks like this:

Before
=======
Fallback order for Node 0: 0 1 2 3 4 5 6 7
Fallback order for Node 1: 1 2 3 0 5 6 7 4
Fallback order for Node 2: 2 3 0 1 6 7 4 5
Fallback order for Node 3: 3 0 1 2 7 4 5 6
Fallback order for Node 4: 4 5 6 7 0 1 2 3
Fallback order for Node 5: 5 6 7 4 0 1 2 3
Fallback order for Node 6: 6 7 4 5 0 1 2 3
Fallback order for Node 7: 7 4 5 6 0 1 2 3

After the fix
==============
Fallback order for Node 0: 0 1 2 3 4 5 6 7
Fallback order for Node 1: 1 2 3 0 5 6 7 4
Fallback order for Node 2: 2 3 0 1 6 7 4 5
Fallback order for Node 3: 3 0 1 2 7 4 5 6
Fallback order for Node 4: 4 5 6 7 0 1 2 3
Fallback order for Node 5: 5 6 7 4 1 2 3 0
Fallback order for Node 6: 6 7 4 5 2 3 0 1
Fallback order for Node 7: 7 4 5 6 3 0 1 2

So the problem becomes more pronounced for bigger NUMA systems.

Regards,
Bharata.