Re: [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set

From: Donet Tom
Date: Fri Apr 11 2025 - 07:42:25 EST



On 4/11/25 4:29 PM, Mike Rapoport wrote:
On Fri, Apr 11, 2025 at 12:27:28AM +0530, Donet Tom wrote:
On 4/10/25 1:37 PM, Mike Rapoport wrote:
On Wed, Apr 09, 2025 at 10:57:57AM +0530, Donet Tom wrote:
In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is
set, we iterate over all PFNs in the memory block and use
early_pfn_to_nid to find the NID until a match is found.

This patch we are using curr_node_memblock_intersect_memory_block() to
check if the current node's memblock intersects with the memory block
passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection
is found, the memory block is added to the current node.

If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism
for finding the NID will continue to be used.
I don't think we really need different mechanisms for different settings of
CONFIG_DEFERRED_STRUCT_PAGE_INIT.

node_dev_init() runs after all struct pages are already initialized and can
always use pfn_to_nid().

In the current implementation, if CONFIG_DEFERRED_STRUCT_PAGE_INIT
is enabled, we perform a binary search in the memblock region to
determine the pfn's nid. Otherwise, we use pfn_to_nid() to obtain
the pfn's nid.

Your point is that we could unify this logic and always use
pfn_to_nid() to determine the pfn's nid, regardless of whether
CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. Is that
correct?
Yes, struct pages should be ready by the time node_dev_init() is called
even when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set.

ok.

Thanks Mike.

kernel_init_freeable() ->
page_alloc_init_late(); /* completes initialization of deferred pages */
...
do_basic_setup() ->
driver_init() ->
node_dev_init();

The next step could be refactoring register_mem_block_under_node_early() to
loop over memblock regions rather than over pfns.
So it the current implementation

node_dev_init()
    register_one_node
        register_memory_blocks_under_node
            walk_memory_blocks()
                register_mem_block_under_node_early
                    get_nid_for_pfn

We get each node's start and end PFN from the pg_data. Using these
values, we determine the memory block's start and end within the
current node. To identify the node to which these memory block
belongs,we iterate over each PFN in the range.

The problem I am facing is,

In my system node4 has a memory block ranging from memory30351
to memory38524, and memory128433. The memory blocks between
memory38524 and memory128433 do not belong to this node.

In  walk_memory_blocks() we iterate over all memory blocks starting
from memory38524 to memory128433.
In register_mem_block_under_node_early(), up to memory38524, the
first pfn correctly returns the corresponding nid and the function
returns from there. But after memory38524 and until memory128433,
the loop iterates through each pfn and checks the nid. Since the nid
does not match the required nid, the loop continues. This causes
the soft lockups.

This issue occurs only when CONFIG_DEFERRED_STRUCT_PAGE_INIT
is enabled, as a binary search is used to determine the PFN's nid. When
this configuration is disabled, pfn_to_nid is faster, and the issue does
not seen.( Faster because nid is getting from page)

To speed up the code when CONFIG_DEFERRED_STRUCT_PAGE_INIT
is enabled, I added this function that iterates over all memblock regions
for each memory block to determine its nid.

"Loop over memblock regions instead of iterating over PFNs" -
My question is - in register_one_node, do you mean that we should iterate
over all memblock regions, identify the regions belonging to the current
node, and then retrieve the corresponding memory blocks to register them
under that node?
I looked more closely at register_mem_block_under_node_early() and
iteration over memblock regions won't make sense there.

It might make sense to use for_each_mem_range() as top level loop in
node_dev_init(), but that's a separate topic.

Yes, this makes sense to me as well. So in your opinion, instead of adding
a new memblock search function like I added , it's better to use
|for_each_mem_range()| in|node_dev_init()|, which would work for all
cases—regardless of whether|CONFIG_DEFERRED_STRUCT_PAGE_INIT| is set or
not. Right?


Thanks
Donet