RE: [PATCHv2 1/1] lib/tests: test_ratelimit: fix stress test thread lifecycle and leak
From: Justin He
Date: Fri May 29 2026 - 04:20:49 EST
> -----Original Message-----
> From: Petr Mladek <pmladek@xxxxxxxx>
> Sent: Thursday, May 21, 2026 8:51 PM
> To: Justin He <Justin.He@xxxxxxx>
> Cc: Paul E. McKenney <paulmck@xxxxxxxxxx>; Kees Cook <kees@xxxxxxxxxx>;
> linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCHv2 1/1] lib/tests: test_ratelimit: fix stress test thread
> lifecycle and leak
>
> On Tue 2026-05-19 09:37:45, Jia He wrote:
> > The stress test checks the kthread pointer in the child thread, but
> > the thread was created with kthread_run(). Since kthread_run() wakes
> > the thread before returning, the child can run before sktp[i].tp is
> > assigned, triggering the WARN_ON_ONCE() check on an arm64 N2 server
> with 128 cores.
>
> True.
>
> > Use kthread_create() instead, assign sktp[i].tp after successful
> > creation, and then wake the thread explicitly. This guarantees that
> > the child observes an initialized task pointer.
>
> IMHO, this does not guarantee that the task pointer has been assigned.
> AFAIK, there might be spurious wakeups, for example, by the suspend/resume
> or livepatching code.
>
> I have another idea, see below.
>
> > Also add a common cleanup path for thread creation failures. If
> > creating one of the later threads fails, stop all threads that were
> > already started and free the allocated array instead of leaving orphan
> > kthreads and leaked memory behind.
> >
> > Finally, replace the module-static doneflag with kthread_should_stop().
> > With the doneflag, child threads may exit before the parent calls
> > kthread_stop(), so the task lifetime is no longer guaranteed when the
> > parent later tries to stop them. Using kthread_should_stop() keeps
> > each child alive until kthread_stop() synchronously terminates it.
>
> This part looks good to me.
>
> > Signed-off-by: Jia He <justin.he@xxxxxxx>
> > ---
> > v2: replace the module-static doneflag with kthread_should_stop() to
> > avoid UAF (sashiko-bot) lib/tests/test_ratelimit.c | 37
> > ++++++++++++++++++++++++++-----------
> > 1 file changed, 26 insertions(+), 11 deletions(-)
> >
> > diff --git a/lib/tests/test_ratelimit.c b/lib/tests/test_ratelimit.c
> > index 33cea5f3d28b..d95c49e914c7 100644
> > --- a/lib/tests/test_ratelimit.c
> > +++ b/lib/tests/test_ratelimit.c
> > @@ -68,7 +68,6 @@ static void test_ratelimit_smoke(struct kunit *test)
> > static struct ratelimit_state stressrl =
> RATELIMIT_STATE_INIT_FLAGS("stressrl", HZ / 10, 3,
> >
> RATELIMIT_MSG_ON_RELEASE);
> >
> > -static int doneflag;
> > static const int stress_duration = 2 * HZ;
> >
> > struct stress_kthread {
> > @@ -86,7 +85,7 @@ static int test_ratelimit_stress_child(void *arg)
> > set_user_nice(current, MAX_NICE);
> > WARN_ON_ONCE(!sktp->tp);
>
> It seem that "sktp->tp" is not really used in this function.
> I am not sure why the WARN_ON_ONCE() is there.
>
> It would simply remove this check and keep the original kthread_run().
>
> >
> > - while (!READ_ONCE(doneflag)) {
> > + while (!kthread_should_stop()) {
> > sktp->nattempts++;
> > if (___ratelimit(&stressrl, __func__))
> > sktp->nunlimited++;
>
Thanks Petr, makes sense.
Since the child does not really use sktp->tp, I’ll just remove the
WARN_ON_ONCE() and keep kthread_run().
I’ll still keep the kthread_should_stop() conversion and the cleanup path
for partial creation failures.
---
Cheers,
Justin He(Jia He)