Re: [PATCH] mm/damon: simplify stop mechanism

From: Changbin Du
Date: Wed Oct 27 2021 - 02:13:54 EST


On Tue, Oct 26, 2021 at 06:42:03PM +0000, SeongJae Park wrote:
> Hello Changbin,
>
> On Tue, 26 Oct 2021 23:30:33 +0800 Changbin Du <changbin.du@xxxxxxxxx> wrote:
>
> > An kernel thread can exit gracefully with kthread_stop(). So we don't need a
> > new flag 'kdamond_stop'. And to make sure the task struct is not freed when
> > accessing it, get task struct on start and put it on stop.
>
> We previously considered using kthread_stop() here. However, we resulted in
> current code because kdamond can be self-terminated when all target processes
> are invalid[1].
>
> Seems this patch is also not fully prepared for the self-termination case. I
> left some comments below.
>
> [1] https://lore.kernel.org/linux-mm/20210624102623.24563-1-sjpark@xxxxxxxxx/
>
> >
> > And since the return value of 'before_terminate' callback is never used,
> > we make it have no return value.
>
> This looks nice to me. Could you please send this again as a separate patch?
>
Sure, I'll do it later.

> >
> > Signed-off-by: Changbin Du <changbin.du@xxxxxxxxx>
> > ---
> > include/linux/damon.h | 3 +--
> > mm/damon/core.c | 59 +++++++++++++------------------------------
> > mm/damon/dbgfs.c | 5 ++--
> > 3 files changed, 20 insertions(+), 47 deletions(-)
> >
> > diff --git a/include/linux/damon.h b/include/linux/damon.h
> > index a14b3cc54cab..041966786270 100644
> > --- a/include/linux/damon.h
> > +++ b/include/linux/damon.h
> [...]
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> [...]
> > @@ -1069,7 +1048,7 @@ static int kdamond_fn(void *data)
> > sz_limit);
> > if (ctx->callback.after_aggregation &&
> > ctx->callback.after_aggregation(ctx))
> > - set_kdamond_stop(ctx);
> > + done = true;
> > kdamond_apply_schemes(ctx);
> > kdamond_reset_aggregated(ctx);
> > kdamond_split_regions(ctx);
> > @@ -1088,16 +1067,12 @@ static int kdamond_fn(void *data)
> > damon_destroy_region(r, t);
> > }
> >
> > - if (ctx->callback.before_terminate &&
> > - ctx->callback.before_terminate(ctx))
> > - set_kdamond_stop(ctx);
> > + if (ctx->callback.before_terminate)
> > + ctx->callback.before_terminate(ctx);
> > if (ctx->primitive.cleanup)
> > ctx->primitive.cleanup(ctx);
> >
> > pr_debug("kdamond (%d) finishes\n", current->pid);
> > - mutex_lock(&ctx->kdamond_lock);
> > - ctx->kdamond = NULL;
> > - mutex_unlock(&ctx->kdamond_lock);
>
> When kdamond is self-terminating, ctx->kdamond will not be nullfified. As a
> result, this patch can introduce some errors like below:
>
> # cd /sys/kernel/debug/damon
> # sleep 60 &
> [1] 1926
> # echo $(pidof sleep) > target_ids
> # echo on > monitor_on
> # cat monitor_on
> on
> # # after 60 seconds, sleep finishes and kdamond is self-terminated
> # cat monitor_on
> off
> # echo 42 > target_ids
> bash: echo: write error: Device or resource busy
>
> If we simply restore the nullification here with the mutex locking, we would
> result in a deadlock because __damon_stop() calls kthread_stop() while holding
> ctx->kdamond_lock.
>
> Also, the reference count of ctx->kdamond, which increased by __damon_start(),
> would not be decreased in the case.
>

If so, I suppose below change should work correctly (still set ctx->kdamond to
NULL at the end of kdamond).

static int __damon_stop(struct damon_ctx *ctx)
{
+ struct task_struct *tsk;
+
mutex_lock(&ctx->kdamond_lock);
- if (ctx->kdamond) {
- ctx->kdamond_stop = true;
+ tsk = ctx->kdamond;
+ if (tsk) {
+ get_task_struct(tsk);
mutex_unlock(&ctx->kdamond_lock);
- while (damon_kdamond_running(ctx))
- usleep_range(ctx->sample_interval,
- ctx->sample_interval * 2);
+ kthread_stop(tsk);
+ put_task_struct(tsk);
return 0;
}
mutex_unlock(&ctx->kdamond_lock);


> If I'm missing something, please let me know.
>
>
> Thanks,
> SJ
>
> [...]

--
Cheers,
Changbin Du