Re: Re: [PATCH v6 02/14] mm/damon: Implement region based sampling

From: SeongJae Park
Date: Tue Mar 10 2020 - 07:53:15 EST


Added replies to your every comment in line below. I agree to your whole
opinions, will apply those in next spin! :)

On Tue, 10 Mar 2020 08:57:21 +0000 Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:

> On Mon, 24 Feb 2020 13:30:35 +0100
> SeongJae Park <sjpark@xxxxxxxxxx> wrote:
>
> > From: SeongJae Park <sjpark@xxxxxxxxx>
> >
> > This commit implements DAMON's basic access check and region based
> > sampling mechanisms. This change would seems make no sense, mainly
> > because it is only a part of the DAMON's logics. Following two commits
> > will make more sense.
> >
> > This commit also exports `lookup_page_ext()` to GPL modules because
> > DAMON uses the function but also supports the module build.
>
> Do that as a separate patch before this one. Makes it easy to spot.

Agreed, will do so.

>
> >
[...]
>
> Various things inline. In particularly can you make use of standard
> kthread_stop infrastructure rather than rolling your own?

Nice suggestion! That will be much better, will use it.

>
> > ---
> > mm/damon.c | 509 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > mm/page_ext.c | 1 +
> > 2 files changed, 510 insertions(+)
> >
> > diff --git a/mm/damon.c b/mm/damon.c
> > index aafdca35b7b8..6bdeb84d89af 100644
> > --- a/mm/damon.c
> > +++ b/mm/damon.c
> > @@ -9,9 +9,14 @@
> >
[...]
> > +/*
> > + * Get the mm_struct of the given task
> > + *
> > + * Callser should put the mm_struct after use, unless it is NULL.
>
> Caller

Good eye! Will fix it.

>
> > + *
> > + * Returns the mm_struct of the task on success, NULL on failure
> > + */
> > +static struct mm_struct *damon_get_mm(struct damon_task *t)
> > +{
> > + struct task_struct *task;
> > + struct mm_struct *mm;
> > +
> > + task = damon_get_task_struct(t);
> > + if (!task)
> > + return NULL;
> > +
> > + mm = get_task_mm(task);
> > + put_task_struct(task);
> > + return mm;
> > +}
> > +
> > +/*
> > + * Size-evenly split a region into 'nr_pieces' small regions
> > + *
> > + * Returns 0 on success, or negative error code otherwise.
> > + */
> > +static int damon_split_region_evenly(struct damon_ctx *ctx,
> > + struct damon_region *r, unsigned int nr_pieces)
> > +{
> > + unsigned long sz_orig, sz_piece, orig_end;
> > + struct damon_region *piece = NULL, *next;
> > + unsigned long start;
> > +
> > + if (!r || !nr_pieces)
> > + return -EINVAL;
> > +
> > + orig_end = r->vm_end;
> > + sz_orig = r->vm_end - r->vm_start;
> > + sz_piece = sz_orig / nr_pieces;
> > +
> > + if (!sz_piece)
> > + return -EINVAL;
> > +
> > + r->vm_end = r->vm_start + sz_piece;
> > + next = damon_next_region(r);
> > + for (start = r->vm_end; start + sz_piece <= orig_end;
> > + start += sz_piece) {
> > + piece = damon_new_region(ctx, start, start + sz_piece);
> > + damon_add_region(piece, r, next);
> > + r = piece;
> > + }
>
> I'd add a comment here. I think this next bit is to catch any rounding error
> holes, but I'm not 100% sure.

Yes, will make it clearer.

>
> > + if (piece)
> > + piece->vm_end = orig_end;
>
> blank line here.

Will add.

>
> > + return 0;
> > +}
[...]
> > +/*
> > + * Initialize the monitoring target regions for the given task
> > + *
> > + * t the given target task
> > + *
> > + * Because only a number of small portions of the entire address space
> > + * is acutally mapped to the memory and accessed, monitoring the unmapped
>
> actually

Good eye! Will consider adding these in 'scripts/spelling.txt'.

>
[...]
> > +/*
> > + * Check whether the given region has accessed since the last check
>
> Should also make clear that this sets us up for the next access check at
> a different memory address it the region.
>
> Given the lack of connection between activities perhaps just split this into
> two functions that are always called next to each other.

Will make the description more clearer as suggested.

Also, I found that I'm not clearing *pte and *pmd before going 'mkold', thanks
to this comment. Will fix it, either.

>
> > + *
> > + * mm 'mm_struct' for the given virtual address space
> > + * r the region to be checked
> > + */
> > +static void kdamond_check_access(struct damon_ctx *ctx,
> > + struct mm_struct *mm, struct damon_region *r)
> > +{
> > + pte_t *pte = NULL;
> > + pmd_t *pmd = NULL;
> > + spinlock_t *ptl;
> > +
> > + if (follow_pte_pmd(mm, r->sampling_addr, NULL, &pte, &pmd, &ptl))
> > + goto mkold;
> > +
> > + /* Read the page table access bit of the page */
> > + if (pte && pte_young(*pte))
> > + r->nr_accesses++;
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>
> Is it worth having this protection? Seems likely to have only a very small
> influence on performance and makes it a little harder to reason about the code.

It was necessary for addressing 'implicit declaration' problem of 'pmd_young()'
and 'pmd_mkold()' for build of DAMON on several architectures including User
Mode Linux.

Will modularize the code for better readability.

>
> > + else if (pmd && pmd_young(*pmd))
> > + r->nr_accesses++;
> > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > +
> > + spin_unlock(ptl);
> > +
> > +mkold:
> > + /* mkold next target */
> > + r->sampling_addr = damon_rand(ctx, r->vm_start, r->vm_end);
> > +
> > + if (follow_pte_pmd(mm, r->sampling_addr, NULL, &pte, &pmd, &ptl))
> > + return;
> > +
> > + if (pte) {
> > + if (pte_young(*pte)) {
> > + clear_page_idle(pte_page(*pte));
> > + set_page_young(pte_page(*pte));
> > + }
> > + *pte = pte_mkold(*pte);
> > + }
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > + else if (pmd) {
> > + if (pmd_young(*pmd)) {
> > + clear_page_idle(pmd_page(*pmd));
> > + set_page_young(pmd_page(*pmd));
> > + }
> > + *pmd = pmd_mkold(*pmd);
> > + }
> > +#endif
> > +
> > + spin_unlock(ptl);
> > +}
> > +
> > +/*
> > + * Check whether a time interval is elapsed
>
> Another comment block that would be clearer if it was kernel-doc rather
> than nearly kernel-doc

Will apply the kernel-doc syntax.

>
> > + *
> > + * baseline the time to check whether the interval has elapsed since
> > + * interval the time interval (microseconds)
> > + *
> > + * See whether the given time interval has passed since the given baseline
> > + * time. If so, it also updates the baseline to current time for next check.
> > + *
> > + * Returns true if the time interval has passed, or false otherwise.
> > + */
> > +static bool damon_check_reset_time_interval(struct timespec64 *baseline,
> > + unsigned long interval)
> > +{
> > + struct timespec64 now;
> > +
> > + ktime_get_coarse_ts64(&now);
> > + if ((timespec64_to_ns(&now) - timespec64_to_ns(baseline)) <
> > + interval * 1000)
> > + return false;
> > + *baseline = now;
> > + return true;
> > +}
> > +
> > +/*
> > + * Check whether it is time to flush the aggregated information
> > + */
> > +static bool kdamond_aggregate_interval_passed(struct damon_ctx *ctx)
> > +{
> > + return damon_check_reset_time_interval(&ctx->last_aggregation,
> > + ctx->aggr_interval);
> > +}
> > +
> > +/*
> > + * Reset the aggregated monitoring results
> > + */
> > +static void kdamond_flush_aggregated(struct damon_ctx *c)
>
> I wouldn't expect a reset function to be called flush.

It will work as flushing in next commit, but it makes no sense now. Will
rename it.

>
> > +{
> > + struct damon_task *t;
> > + struct damon_region *r;
> > +
> > + damon_for_each_task(c, t) {
> > + damon_for_each_region(r, t)
> > + r->nr_accesses = 0;
> > + }
> > +}
> > +
> > +/*
> > + * Check whether current monitoring should be stopped
> > + *
> > + * If users asked to stop, need stop. Even though no user has asked to stop,
> > + * need stop if every target task has dead.
> > + *
> > + * Returns true if need to stop current monitoring.
> > + */
> > +static bool kdamond_need_stop(struct damon_ctx *ctx)
> > +{
> > + struct damon_task *t;
> > + struct task_struct *task;
> > + bool stop;
> > +
>
> As below comment asks, can you use kthread_should_stop?

Yes, I will.

>
> > + spin_lock(&ctx->kdamond_lock);
> > + stop = ctx->kdamond_stop;
> > + spin_unlock(&ctx->kdamond_lock);
> > + if (stop)
> > + return true;
> > +
> > + damon_for_each_task(ctx, t) {
> > + task = damon_get_task_struct(t);
> > + if (task) {
> > + put_task_struct(task);
> > + return false;
> > + }
> > + }
> > +
> > + return true;
> > +}
> > +
> > +/*
> > + * The monitoring daemon that runs as a kernel thread
> > + */
> > +static int kdamond_fn(void *data)
> > +{
> > + struct damon_ctx *ctx = (struct damon_ctx *)data;
>
> Never any need to explicitly cast a void * to some other pointer type.
> (C spec)

Ah, you're right.

>
> struct damon_ctx *ctx = data;
> > + struct damon_task *t;
> > + struct damon_region *r, *next;
> > + struct mm_struct *mm;
> > +
> > + pr_info("kdamond (%d) starts\n", ctx->kdamond->pid);
> > + kdamond_init_regions(ctx);
> > + while (!kdamond_need_stop(ctx)) {
> > + damon_for_each_task(ctx, t) {
> > + mm = damon_get_mm(t);
> > + if (!mm)
> > + continue;
> > + damon_for_each_region(r, t)
> > + kdamond_check_access(ctx, mm, r);
> > + mmput(mm);
> > + }
> > +
> > + if (kdamond_aggregate_interval_passed(ctx))
> > + kdamond_flush_aggregated(ctx);
> > +
> > + usleep_range(ctx->sample_interval, ctx->sample_interval + 1);
>
> Is there any purpose in using a range for such a narrow window?

Actually, it needs to sleep only 'ctx->sample_interval', and thus I set the
interval so narrow.

>
> > + }
> > + damon_for_each_task(ctx, t) {
> > + damon_for_each_region_safe(r, next, t)
> > + damon_destroy_region(r);
> > + }
> > + pr_info("kdamond (%d) finishes\n", ctx->kdamond->pid);
>
> Feels like noise. I'd drop tis to pr_debug.

Agreed, will remove it.

>
> > + spin_lock(&ctx->kdamond_lock);
> > + ctx->kdamond = NULL;
> > + spin_unlock(&ctx->kdamond_lock);
>
> blank line.

Yup!

>
> > + return 0;
> > +}
> > +
> > +/*
> > + * Controller functions
> > + */
> > +
> > +/*
> > + * Start or stop the kdamond
> > + *
> > + * Returns 0 if success, negative error code otherwise.
> > + */
> > +static int damon_turn_kdamond(struct damon_ctx *ctx, bool on)
> > +{
> > + spin_lock(&ctx->kdamond_lock);
> > + ctx->kdamond_stop = !on;
>
> Can't use the kthread_stop / kthread_should_stop approach?

Will use it.

>
> > + if (!ctx->kdamond && on) {
> > + ctx->kdamond = kthread_run(kdamond_fn, ctx, "kdamond");
> > + if (!ctx->kdamond)
> > + goto fail;
> > + goto success;
>
> cleaner as
> int ret = 0; above then
>
> if (!ctx->kdamond)
> ret = -EINVAL;
> goto unlock;
>
> with
>
> unlock:
> spin_unlock(&ctx->dmanond_lock);
> return ret;

Agreed, will change so.

>
> > + }
> > + if (ctx->kdamond && !on) {
> > + spin_unlock(&ctx->kdamond_lock);
> > + while (true) {
>
> An unbounded loop is probably a bad idea.

Will add clear condition here.

>
> > + spin_lock(&ctx->kdamond_lock);
> > + if (!ctx->kdamond)
> > + goto success;
> > + spin_unlock(&ctx->kdamond_lock);
> > +
> > + usleep_range(ctx->sample_interval,
> > + ctx->sample_interval * 2);
> > + }
> > + }
> > +
> > + /* tried to turn on while turned on, or turn off while turned off */
> > +
> > +fail:
> > + spin_unlock(&ctx->kdamond_lock);
> > + return -EINVAL;
> > +
> > +success:
> > + spin_unlock(&ctx->kdamond_lock);
> > + return 0;
> > +}
> > +
> > +/*
> > + * This function should not be called while the kdamond is running.
> > + */
> > +static int damon_set_pids(struct damon_ctx *ctx,
> > + unsigned long *pids, ssize_t nr_pids)
> > +{
> > + ssize_t i;
> > + struct damon_task *t, *next;
> > +
> > + damon_for_each_task_safe(ctx, t, next)
> > + damon_destroy_task(t);
> > +
> > + for (i = 0; i < nr_pids; i++) {
> > + t = damon_new_task(pids[i]);
> > + if (!t) {
> > + pr_err("Failed to alloc damon_task\n");
> > + return -ENOMEM;
> > + }
> > + damon_add_task_tail(ctx, t);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +/*
>
> This is kind of similar to kernel-doc formatting. Might as well just make
> it kernel-doc!

Agreed, will do so!

>
> > + * Set attributes for the monitoring
> > + *
> > + * sample_int time interval between samplings
> > + * aggr_int time interval between aggregations
> > + * min_nr_reg minimal number of regions
> > + *
> > + * This function should not be called while the kdamond is running.
> > + * Every time interval is in micro-seconds.
> > + *
> > + * Returns 0 on success, negative error code otherwise.
> > + */
> > +static int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
> > + unsigned long aggr_int, unsigned long min_nr_reg)
> > +{
> > + if (min_nr_reg < 3) {
> > + pr_err("min_nr_regions (%lu) should be bigger than 2\n",
> > + min_nr_reg);
> > + return -EINVAL;
> > + }
> > +
> > + ctx->sample_interval = sample_int;
> > + ctx->aggr_interval = aggr_int;
> > + ctx->min_nr_regions = min_nr_reg;
>
> blank line helps readability a tiny little bit.

Agreed!


Thanks,
SeongJae Park

>
> > + return 0;
> > +}
> > +
> > static int __init damon_init(void)
> > {
> > pr_info("init\n");
> > diff --git a/mm/page_ext.c b/mm/page_ext.c
> > index 4ade843ff588..71169b45bba9 100644
> > --- a/mm/page_ext.c
> > +++ b/mm/page_ext.c
> > @@ -131,6 +131,7 @@ struct page_ext *lookup_page_ext(const struct page *page)
> > MAX_ORDER_NR_PAGES);
> > return get_entry(base, index);
> > }
> > +EXPORT_SYMBOL_GPL(lookup_page_ext);
> >
> > static int __init alloc_node_page_ext(int nid)
> > {