Re: [PATCH v7 2/3] kho: fix deferred init of kho scratch
From: Mike Rapoport
Date: Sun Mar 22 2026 - 10:46:19 EST
On Thu, Mar 19, 2026 at 07:17:48PM +0100, Michał Cłapiński wrote:
> On Thu, Mar 19, 2026 at 8:54 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On Wed, Mar 18, 2026 at 01:36:07PM -0400, Zi Yan wrote:
> > > On 18 Mar 2026, at 13:19, Michał Cłapiński wrote:
> > > > On Wed, Mar 18, 2026 at 6:08 PM Zi Yan <ziy@xxxxxxxxxx> wrote:
> > > >>
> > > >> ## Call site analysis
> > > >>
> > > >> init_pageblock_migratetype() has nine call sites. The init call ordering
> > > >> relevant to scratch is:
> > > >>
> > > >> ```
> > > >> setup_arch()
> > > >> zone_sizes_init() -> free_area_init() -> memmap_init_range() [1]
> >
> > Hmm, this is slightly outdated, but largely correct :)
> >
> > > >>
> > > >> mm_init_free_all() / start_kernel():
> > > >> kho_memory_init() -> kho_release_scratch() [2]
> > > >> memblock_free_all()
> > > >> free_low_memory_core_early()
> > > >> memmap_init_reserved_pages()
> > > >> reserve_bootmem_region() -> __init_deferred_page()
> > > >> -> __init_page_from_nid() [3]
> > > >> deferred init kthreads -> __init_page_from_nid() [4]
> >
> > And this is wrong, deferred init does not call __init_page_from_nid, only
> > reserve_bootmem_region() does.
> >
> > And there's a case claude missed:
> >
> > hugetlb_bootmem_free_invalid_page() -> __init_page_from_nid() that
> > shouldn't check for KHO. Well, at least until we have support for hugetlb
> > persistence and most probably even afterwards.
> >
> > I don't think we should modify reserve_bootmem_region(). If there are
> > reserved pages in a pageblock, it does not matter if it's initialized to
> > MIGRATE_CMA. It only becomes important if the reserved pages freed, so we
> > can update pageblock migrate type in free_reserved_area().
> > When we boot with KHO, all memblock allocations come from scratch, so
> > anything freed in free_reserved_area() should become CMA again.
>
> What happens if the reserved area covers one page and that page is
> pageblock aligned? Then it won't be marked as CMA until it is freed
> and unmovable allocation might appear in that pageblock, right?
>
> > +__init_memblock struct memblock_region *memblock_region_from_iter(u64 iterator)
> > +{
> > + int index = iterator & 0xffffffff;
>
> I'm not sure about this. __next_mem_range() has this code:
> /*
> * The region which ends first is
> * advanced for the next iteration.
> */
> if (m_end <= r_end)
> idx_a++;
> else
> idx_b++;
>
> Therefore, the index you get from this might be correct or it might
> already be incremented.
Hmm, right, missed that :/
Still, we can check if an address is inside scratch in
reserve_bootmem_regions() and in deferred_init_pages() and set migrate type
to CMA in that case.
I think something like the patch below should work. It might not be the
most optimized, but it localizes the changes to mm_init and memblock and
does not complicated the code (well, almost).
The patch is on top of
https://lore.kernel.org/linux-mm/20260322143144.3540679-1-rppt@xxxxxxxxxx/T/#u
and I pushed the entire set here:
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=kho-deferred-init
It compiles and passes kho self test with both deferred pages enabled and
disabled, but I didn't do further testing yet.