On Fri, Apr 11, 2025 at 12:27:28AM +0530, Donet Tom wrote:
On 4/10/25 1:37 PM, Mike Rapoport wrote:Yes, struct pages should be ready by the time node_dev_init() is called
On Wed, Apr 09, 2025 at 10:57:57AM +0530, Donet Tom wrote:
In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT isI don't think we really need different mechanisms for different settings of
set, we iterate over all PFNs in the memory block and use
early_pfn_to_nid to find the NID until a match is found.
This patch we are using curr_node_memblock_intersect_memory_block() to
check if the current node's memblock intersects with the memory block
passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection
is found, the memory block is added to the current node.
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism
for finding the NID will continue to be used.
CONFIG_DEFERRED_STRUCT_PAGE_INIT.
node_dev_init() runs after all struct pages are already initialized and can
always use pfn_to_nid().
In the current implementation, if CONFIG_DEFERRED_STRUCT_PAGE_INIT
is enabled, we perform a binary search in the memblock region to
determine the pfn's nid. Otherwise, we use pfn_to_nid() to obtain
the pfn's nid.
Your point is that we could unify this logic and always use
pfn_to_nid() to determine the pfn's nid, regardless of whether
CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. Is that
correct?
even when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set.
I looked more closely at register_mem_block_under_node_early() andkernel_init_freeable() ->So it the current implementation
page_alloc_init_late(); /* completes initialization of deferred pages */
...
do_basic_setup() ->
driver_init() ->
node_dev_init();
The next step could be refactoring register_mem_block_under_node_early() to
loop over memblock regions rather than over pfns.
node_dev_init()
register_one_node
register_memory_blocks_under_node
walk_memory_blocks()
register_mem_block_under_node_early
get_nid_for_pfn
We get each node's start and end PFN from the pg_data. Using these
values, we determine the memory block's start and end within the
current node. To identify the node to which these memory block
belongs,we iterate over each PFN in the range.
The problem I am facing is,
In my system node4 has a memory block ranging from memory30351
to memory38524, and memory128433. The memory blocks between
memory38524 and memory128433 do not belong to this node.
In walk_memory_blocks() we iterate over all memory blocks starting
from memory38524 to memory128433.
In register_mem_block_under_node_early(), up to memory38524, the
first pfn correctly returns the corresponding nid and the function
returns from there. But after memory38524 and until memory128433,
the loop iterates through each pfn and checks the nid. Since the nid
does not match the required nid, the loop continues. This causes
the soft lockups.
This issue occurs only when CONFIG_DEFERRED_STRUCT_PAGE_INIT
is enabled, as a binary search is used to determine the PFN's nid. When
this configuration is disabled, pfn_to_nid is faster, and the issue does
not seen.( Faster because nid is getting from page)
To speed up the code when CONFIG_DEFERRED_STRUCT_PAGE_INIT
is enabled, I added this function that iterates over all memblock regions
for each memory block to determine its nid.
"Loop over memblock regions instead of iterating over PFNs" -
My question is - in register_one_node, do you mean that we should iterate
over all memblock regions, identify the regions belonging to the current
node, and then retrieve the corresponding memory blocks to register them
under that node?
iteration over memblock regions won't make sense there.
It might make sense to use for_each_mem_range() as top level loop in
node_dev_init(), but that's a separate topic.
Thanks
Donet