Re: [PATCH v2 2/2] mm/damon/core: implement multi-context support
From: Alex Rusuf
Date: Sun Jun 02 2024 - 15:48:41 EST
[...]
> >
> > >
> > > >
> > > > /* private: */
> > > > /* for waiting until the execution of the kdamond_fn is started */
> > > > @@ -634,7 +632,10 @@ struct damon_ctx {
> > > > * update
> > > > */
> > > > unsigned long next_ops_update_sis;
> > > > + /* upper limit for each monitoring region */
> > > > unsigned long sz_limit;
> > > > + /* marker to check if context is valid */
> > > > + bool valid;
> > >
> > > What is the validity?
> >
> > This is a "flag" which indicates that the context is "valid" for kdamond
> > to be able to call ->check_accesses() on it. Because we have many contexts
> > we need a way to identify which context is valid while iterating over
> > all of them (please, see kdamond_prepare_access_checks_ctx() function).
>
> It's still not very clear to me. When it is "valid" or not for kdamond to be
> able to call ->check_accesses()? I assume you mean it is valid if the
> monitoring operations set's ->target_valid() returns zero?
>
> The callback is not for preventing unnecessary ->check_accesses(), but for
> knowing if continuing the monitoring makes sense or not. For example, the
> callback of 'vaddr' operation set checks if the virtual address space still
> exists (whether the target process is still running) Calling
> ->check_accesses() for ->target_valid() returned non-zero target is totally ok,
> though it is meaningless. And therefore kdamond stops if it returns non-zero.
>
> >
> > Maybe name should be changed,
>
> At least some more detailed comment would be appreciated, imo.
>
> > but now I don't see a way how we could merge
> > this into kdamond_valid_ctx() or so, because the main cause of this "valid"
> > bit is that there's no more "valid" targets for this context, but also we could
> > have ->after_sampling() returned something bad.
>
> As mentioned above, calling ->check_accesses() or others towards invalidated
> targets (e.g., terminated processes's virtual address spaces) should be ok, if
> any of the targets are still valid. So I don't show special technical
> difficulties here. Please let me know if I'm missing something.
This is true in case of only 1 context. kdamond can be stopped if there's
no valid targets for this context (e.g. no address space exists anymore),
but when there are many contexts we need to check if any of contexts has
valid target(s). For example, we have 3 contexts per kdamond, at some
point of time we have 1st context having no valid targets (process has been
finished), but other 2 contexts do have valid targets, so we don't need
to call ->prepare_access_checks() and ->check_accesses() as well for 1st
context, but we do need to call them for other contexts.
We can call ->kdamond_valid_ctx() before calling ->check_accesses(),
but then we also need to check if nothing bad has been returned from
->after_sampling() call, so that we're allowed to further call
->check_accesses().
I decided to use a "valid" flag inside damon_ctx structure, so
that if this context isn't valid, we will just skip it while
iterating over all contexts. In case of just 1 context
kdamond_prepare_access_checks() will return false, so that
nr_valid_ctxs will remain zero, so we will break "while"
loop, otherwise we'll see that there's at least one valid
context to call ->check_accesses() for.
The last point is that we need to understand for which
context we can call ->check_accesses(), this is done
by checking ctx->valid bit, if it is "false" (we already called
kdamond_valid_ctx() and ->after_sampling(), and one of this calls
failed) we just skip this context by doing "goto next_ctx;"
in main kdamond loop.
>
> >
> > >
> > > >
> > > > /* public: */
> > > > struct kdamond *kdamond;
> > > > @@ -682,6 +683,12 @@ static inline struct damon_ctx *damon_first_ctx(struct kdamond *kdamond)
> > > > return list_first_entry(&kdamond->contexts, struct damon_ctx, list);
> > > > }
> > > >
> > > > +static inline bool damon_is_last_ctx(struct damon_ctx *ctx,
> > > > + struct kdamond *kdamond)
> > > > +{
> > > > + return list_is_last(&ctx->list, &kdamond->contexts);
> > > > +}
> > > > +
> > > > #define damon_for_each_region(r, t) \
> > > > list_for_each_entry(r, &t->regions_list, list)
> > > >
> > > > diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
> > > > index 23200aabc..d5287566c 100644
> > > > --- a/include/trace/events/damon.h
> > > > +++ b/include/trace/events/damon.h
> > >
> > > Let's separate this change to another patch.
> >
> > Separating patches we hardly be able to reach at least build
> > consistency between patches. Moreover DAMON won't be able
> > to function corretly in between.
>
> I agree that it's not very easy, indeed. But let's try. In terms of
> functionality, we need to keep the old behavior that visible to users. For
> example, this change tries to make the traceevent changed for the multi
> contexts support. It is for making the behavior _better_, not for keeping old
> behavior. Rather than that, this is introducing a new change to the tracepoint
> output. Just make no change here. Users may get confused when they use
> multiple contexts, but what they see is not changed.
>
> Further, you can delay letting users (user-space) using the multiple contexts
> (allowing >1 input to nr_contexts of DAMON sysfs interface) after making this
> change in a separate patch.
>
> >
> > >
[...]
> > > >
> > > > -static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t,
> > > > - struct damon_region *r, struct damos *s)
> > > > +static void damos_apply_scheme(unsigned int cidx, struct damon_ctx *c,
> > > > + struct damon_target *t, struct damon_region *r,
> > > > + struct damos *s)
> > >
> > > Unnecesary change.
> >
> > What do you mean? Why is this unnecessary? Now DAMON iterates over
> > contexts and calls kdamond_apply_schemes(ctx), so how can we know
> > which context we print trace for? Sorry, maybe I misunderstood
> > how DAMON does it, but I'm a bit confused.
>
> I believe the above comment to tracevent change explains this.
Just to make it clear, you mean separating this change into different
smaller patch?
[...]
BR,
Alex