Re: [PATCH v2 43/53] selftests/mm: migration: add setup of HugeTLB pages
From: Mike Rapoport
Date: Sun Apr 26 2026 - 06:58:57 EST
On Fri, Apr 24, 2026 at 01:11:45PM -0400, Luiz Capitulino wrote:
> On 2026-04-18 06:55, Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <rppt@xxxxxxxxxx>
> >
> > migration skips HugeTLB tests if there are no free huge pages
> > prepared by a wrapper script.
> >
> > Add setup of HugeTLB pages to the test and make sure that the original
> > settings are restored on the test exit.
> >
> > Since kselftest_harness runs fixture setup and the tests in child
> > processes, use HUGETLB_SETUP_DEFAULT_PAGES() that defines a constructor
> > that runs in the main process and add verification that there are enough
> > free huge pages to the tests that use them.
> >
> > Signed-off-by: Mike Rapoport (Microsoft) <rppt@xxxxxxxxxx>
> > ---
> > tools/testing/selftests/mm/migration.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c
> > index ccf42002ce86..61fb00953f83 100644
> > --- a/tools/testing/selftests/mm/migration.c
> > +++ b/tools/testing/selftests/mm/migration.c
> > @@ -23,6 +23,8 @@
> > #define MAX_RETRIES 100
> > #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1)))
> > +HUGETLB_SETUP_DEFAULT_PAGES(1)
>
> Hey Mike,
>
> I've been reviewing and testing this series and got a reproducible issue
> with this test when running it on a x86 KVM guest with 88 vCPUs.
>
> The issue is that, when executing the full MM suite with
> sudo ./run_vmtests.sh -d -a, all 6 migration test pass but it doesn't exit.
> Instead, it gets stuck after this output:
>
> """
> # # PASSED: 6 / 6 tests passed.
> # # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
> """
>
> Getting a backtrace from gdb I see:
>
> """
> #0 0x00007efd2f2c247b in __lll_lock_wait_private () from /lib64/libc.so.6
> #1 0x00007efd2f26fa88 in __run_exit_handlers () from /lib64/libc.so.6
> #2 0x00007efd2f26fabe in exit () from /lib64/libc.so.6
> #3 0x0000000000404f2e in hugepage_restore_settings_sighandler ()
> #4 <signal handler called>
> #5 0x00007efd2f32f416 in __unregister_atfork () from /lib64/libc.so.6
> #6 0x00007efd2f26f338 in __cxa_finalize () from /lib64/libc.so.6
> #7 0x00007efd2f4548c7 in __do_global_dtors_aux () from /lib64/libm.so.6
> #8 0x00007ffd66ae0320 in ?? ()
> #9 0x00007efd2f55b2d2 in _dl_call_fini (closure_map=0x7efd2f5500c0) at dl-call_fini.c:43
> """
>
> Could we be messing with libc internal state somehow? I also get systemd
> services hung when I try to reboot.
I don't think we are messing with libc internal state, but we surely leave
zombies around.
All the tests that fork() terminate the children with kill() but they never
call wait*() to collect the exit status.
> Some of the migration tests fork() and then kill() their children
> processes. Won't those all restore the hugetlb state concurrently
> from hugepage_restore_settings_atexit()?
Yeah, I missed the kill()s :/
> Also, for shared_anon_htlb, don't we need to reserve a HugeTLB page per
> children?
We only mmap() a single huge page in the parent, the children don't create
new mappings.
> And there's another issue: when running the migration test individually,
> private_anon_htlb gets skipped. I guess it's because the previous test
> is restoring the HugeTLB state:
It could be.
I pushed the updated version that has the fixes for both zombie and
signal issues:
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=mm-selftest/v3
Would be great if you can test it in your setup.
--
Sincerely yours,
Mike.