Re: (sashiko) [PATCH 01/10] mm/damon/core: introduce damon_ctx->paused

From: SeongJae Park

Date: Sat Mar 21 2026 - 16:29:14 EST


TL; DR: I think this needs a few changes. That can be a fixup (picked by
Andrew) or respin for v2.

On Sat, 21 Mar 2026 13:12:04 -0700 SeongJae Park <sj@xxxxxxxxxx> wrote:

> Forwarding Sashiko review for doing discussions via mails.
>
> # review url: https://sashiko.dev/#/patchset/20260321181343.93971-2-sj@xxxxxxxxxx
> # start of sashiko.dev inline review
> commit b56d17c42c1dc6d883f3a75d785a05ba53b5c35b
> Author: SeongJae Park <sj@xxxxxxxxxx>
>
> mm/damon/core: introduce damon_ctx->paused
>
> This commit introduces a paused parameter to the DAMON context API.
> It allows users to pause and resume kdamond execution while preserving
> self-trained data. The main loop is updated to perform limited work
> and sleep when the context is paused.
>
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index db6c67e52d2b..ce010009ca13 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
>
> [ ... ]
>
> > @@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
> > kdamond_call(ctx, false);
> > if (ctx->maybe_corrupted)
> > break;
> > + while (ctx->pause) {
> > + if (kdamond_need_stop(ctx))
> > + goto done;
> > + kdamond_usleep(ctx->attrs.sample_interval);
>
> This isn't a bug, but polling with kdamond_usleep() using the sample
> interval could cause many wakeups per second even while the context is
> paused. Would it be more efficient to use a waitqueue here to consume
> near-zero CPU until the thread is explicitly woken?

sample internval is 5ms by default and recommended auto-tuning setup makes it
hundreds of milliseconds. So I don't think such change is required.

>
> > + /* allow caller unset pause via damon_call() */
> > + kdamond_call(ctx, false);
>
> If a parameter update fails during kdamond_call() and sets
> ctx->maybe_corrupted to true, the loop will continue to run or exit the
> pause loop and proceed to kdamond_apply_schemes() with corrupted
> structures.
>
> Is it possible to bypass the corruption safeguard here? Should we check
> ctx->maybe_corrupted inside this loop to safely terminate the thread?

Good finding. I will add that in the next spin.

>
> > + damos_walk_cancel(ctx);
>
> If kdamond_call() processes a request that unpauses the context, this
> will immediately cancel any pending or concurrent damos_walk() requests
> before the loop condition is re-evaluated.
>
> Could this cause spurious cancellations of walk requests when unpausing
> the context?

As I replied in the previous patch, that can happen and fine. Users can do the
walk again in the case. But, now I agree it may better to reduce such spurious
failures if it can be done in a simple way. And seems it can be. Just doing
this before the sleep.

>
> > + }
> > if (!list_empty(&ctx->schemes))
> > kdamond_apply_schemes(ctx);
> > else
>
>
> # end of sashiko.dev inline review
> # review url: https://sashiko.dev/#/patchset/20260321181343.93971-2-sj@xxxxxxxxxx

So, I think below attaching fixup is needed. Unless Andrew picks it up
together, I will post v2 tomorrow.


Thanks,
SJ
=== >8 ===