Re: (sashiko review) [RFC PATCH v5 01/10] mm/damon/core: introduce damon_ctx->paused

From: SeongJae Park

Date: Tue Mar 24 2026 - 00:07:31 EST


On Mon, 23 Mar 2026 18:28:00 -0700 SeongJae Park <sj@xxxxxxxxxx> wrote:

> Forwarding Sashiko review for doing discussions via mails.
>
> # review url: https://sashiko.dev/#/patchset/20260323231538.84452-2-sj@xxxxxxxxxx
> # start of sashiko.dev inline review
> commit 011f7b451b6e4a657677ee09eada15a7980e8480
> Author: SeongJae Park <sj@xxxxxxxxxx>
>
> mm/damon/core: introduce damon_ctx->paused
>
> This commit introduces a pause parameter to the DAMON context, allowing
> the kdamond main loop to be paused and resumed without losing its
> self-trained internal data.
>
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index db6c67e52d2b..0ab2cfa848e6 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> [ ... ]
> > @@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
> > kdamond_call(ctx, false);
> > if (ctx->maybe_corrupted)
> > break;
> > + while (ctx->pause) {
> > + damos_walk_cancel(ctx);
> > + kdamond_usleep(ctx->attrs.sample_interval);
> > + /* allow caller unset pause via damon_call() */
> > + kdamond_call(ctx, false);
> > + if (kdamond_need_stop(ctx) || ctx->maybe_corrupted)
> > + goto done;
> > + }
>
> Since this loop is located after ops.prepare_access_checks(), kdamond_usleep(),
> and ops.check_accesses(), does this unconditionally perform a full monitoring
> cycle before honoring the paused state?
>
> If a context is started with pause set to true, or if pause is enabled while
> waiting in kdamond_wait_activation(), it appears it will still execute one
> complete active sampling cycle. Could this cause unexpected disruption, such
> as clearing PTE accessed bits, when the context is explicitly configured to
> be paused?

Yes, DAMON would behave in the way. And having one more sampling work doesn't
cause a real issue.

>
> Additionally, when the context is paused, kdamond_fn remains in this sleep
> loop and skips calling damos_adjust_quota(), which means quota->charged_from
> is frozen.
>
> If the pause lasts for more than LONG_MAX jiffies (e.g., about 24.8 days on
> a 32-bit system with HZ=1000), will system jiffies wrap past the half-range
> limit of the unsigned long type relative to the frozen timestamp?
>
> If this wraparound occurs, time_after_eq(jiffies, quota->charged_from +
> interval) would evaluate to false when unpaused. Would this cause the quota
> window to not reset for another 24.8 days, accumulating charged_sz and
> blocking the scheme from being applied?

That's a wild corner case, but I agree it is better to avoid the problematic
case. I'm still thinking about the good way for that. Anyway, I will address
this in the next spin.



Thanks,
SJ

[...]