Re: [PATCH v22 02/18] mm/damon: Implement region based sampling

From: Shakeel Butt
Date: Wed Nov 25 2020 - 10:29:44 EST


On Tue, Oct 20, 2020 at 2:01 AM SeongJae Park <sjpark@xxxxxxxxxx> wrote:
>
> From: SeongJae Park <sjpark@xxxxxxxxx>
>
> DAMON separates its monitoring target address space independent high
> level logics

Please rewrite the above sentence to be more clear.

> from the target space dependent low level primitives for
> flexible support of various address spaces.
>
> This commit implements DAMON's target address space independent high
> level logics for basic access check and region based sampling.

Same for the above text.

> Hence,
> without the target address space specific parts implementations, this
> doesn't work alone. A reference implementation of those will be
> provided by a later commit.
>
> Basic Access Check
> ==================
>
> The output of DAMON says what pages are how frequently accessed for a
> given duration. The resolution of the access frequency is controlled by
> setting ``sampling interval`` and ``aggregation interval``. In detail,
> DAMON checks access to each page per ``sampling interval`` and
> aggregates the results. In other words, counts the number of the
> accesses to each page. After each ``aggregation interval`` passes,
> DAMON calls callback functions that previously registered by users so
> that users can read the aggregated results and then clears the results.
> This can be described in below simple pseudo-code::
>
> while monitoring_on:
> for page in monitoring_target:
> if accessed(page):
> nr_accesses[page] += 1
> if time() % aggregation_interval == 0:
> for callback in user_registered_callbacks:
> callback(monitoring_target, nr_accesses)
> for page in monitoring_target:
> nr_accesses[page] = 0
> sleep(sampling interval)
>
> The monitoring overhead of this mechanism will arbitrarily increase as
> the size of the target workload grows.
>

Please move the sampling part in a separate patch.

> Region Based Sampling
> =====================
>
> To avoid the unbounded increase of the overhead, DAMON groups adjacent
> pages that assumed to have the same access frequencies into a region.
> As long as the assumption (pages in a region have the same access
> frequencies) is kept, only one page in the region is required to be
> checked. Thus, for each ``sampling interval``, DAMON randomly picks one
> page in each region, waits for one ``sampling interval``, checks whether
> the page is accessed meanwhile, and increases the access frequency of
> the region if so. Therefore, the monitoring overhead is controllable by
> setting the number of regions. DAMON allows users to set the minimum
> and the maximum number of regions for the trade-off.
>
> This scheme, however, cannot preserve the quality of the output if the
> assumption is not guaranteed. Next commit will address this problem.
>
> Signed-off-by: SeongJae Park <sjpark@xxxxxxxxx>
> Reviewed-by: Leonard Foerster <foersleo@xxxxxxxxx>
> ---
> include/linux/damon.h | 133 ++++++++++++++++-
> mm/damon/core.c | 333 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 465 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index 183e0edd7f43..1f7b095646c2 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -8,6 +8,8 @@
> #ifndef _DAMON_H_
> #define _DAMON_H_
>
> +#include <linux/mutex.h>
> +#include <linux/time64.h>
> #include <linux/types.h>
>
> /**
> @@ -23,11 +25,13 @@ struct damon_addr_range {
> /**
> * struct damon_region - Represents a monitoring target region.
> * @ar: The address range of the region.
> + * @sampling_addr: Address of the sample for the next access check.
> * @nr_accesses: Access frequency of this region.
> * @list: List head for siblings.
> */
> struct damon_region {
> struct damon_addr_range ar;
> + unsigned long sampling_addr;
> unsigned int nr_accesses;
> struct list_head list;
> };
> @@ -50,12 +54,130 @@ struct damon_target {
> struct list_head list;
> };
>
> +struct damon_ctx;
> +
> /**
> - * struct damon_ctx - Represents a context for each monitoring.
> + * struct damon_primitive Monitoring primitives for given use cases.
> + *
> + * @init_target_regions: Constructs initial monitoring target regions.
> + * @prepare_access_checks: Prepares next access check of target regions.
> + * @check_accesses: Checks the access of target regions.
> + * @target_valid: Determine if the target is valid.
> + * @cleanup: Cleans up the context.
> + *
> + * DAMON can be extended for various address spaces and usages. For this,
> + * users should register the low level primitives for their target address
> + * space and usecase via the &damon_ctx.primitive. Then, the monitoring thread
> + * calls @init_target_regions before starting the monitoring and
> + * @prepare_access_checks, @check_accesses, and @target_valid for each
> + * @sample_interval.
> + *
> + * @init_target_regions should construct proper monitoring target regions and
> + * link those to the DAMON context struct.
> + * @prepare_access_checks should manipulate the monitoring regions to be
> + * prepare for the next access check.
> + * @check_accesses should check the accesses to each region that made after the
> + * last preparation and update the `->nr_accesses` of each region. It should
> + * also return max &damon_region.nr_accesses that made as a result of its
> + * update.
> + * @target_valid should check whether the target is still valid for the
> + * monitoring.
> + * @cleanup is called from @kdamond just before its termination. After this
> + * call, only @kdamond_lock and @kdamond will be touched.
> + */
> +struct damon_primitive {
> + void (*init_target_regions)(struct damon_ctx *context);
> + void (*prepare_access_checks)(struct damon_ctx *context);
> + unsigned int (*check_accesses)(struct damon_ctx *context);
> + bool (*target_valid)(struct damon_target *target);
> + void (*cleanup)(struct damon_ctx *context);
> +};
> +
> +/*
> + * struct damon_callback Monitoring events notification callbacks.
> + *
> + * @before_start: Called before starting the monitoring.
> + * @after_sampling: Called after each sampling.
> + * @after_aggregation: Called after each aggregation.
> + * @before_terminate: Called before terminating the monitoring.
> + * @private: User private data.
> + *
> + * The monitoring thread (&damon_ctx->kdamond) calls @before_start and
> + * @before_terminate just before starting the monitoring and just before
> + * finishing the monitoring. Therefore, those are good places for installing
> + * and cleaning @private.
> + *
> + * The monitoring thread calls @after_sampling and @after_aggregation for each
> + * of the sampling intervals and aggregation intervals, respectively.
> + * Therefore, users can safely access the monitoring results via
> + * &damon_ctx.targets_list without additional protection of
> + * damon_ctx.kdamond_lock. For the reason, users are recommended to use these
> + * callback for the accesses to the results.
> + *
> + * If any callback returns non-zero, monitoring stops.
> + */
> +struct damon_callback {
> + void *private;
> +
> + int (*before_start)(struct damon_ctx *context);
> + int (*after_sampling)(struct damon_ctx *context);
> + int (*after_aggregation)(struct damon_ctx *context);
> + int (*before_terminate)(struct damon_ctx *context);
> +};
> +
> +/**
> + * struct damon_ctx - Represents a context for each monitoring. This is the
> + * main interface that allows users to set the attributes and get the results
> + * of the monitoring.
> + *
> + * @sample_interval: The time between access samplings.
> + * @aggr_interval: The time between monitor results aggregations.
> + * @nr_regions: The number of monitoring regions.
> + *
> + * For each @sample_interval, DAMON checks whether each region is accessed or
> + * not. It aggregates and keeps the access information (number of accesses to
> + * each region) for @aggr_interval time. All time intervals are in
> + * micro-seconds.
> + *
> + * @kdamond: Kernel thread who does the monitoring.
> + * @kdamond_stop: Notifies whether kdamond should stop.
> + * @kdamond_lock: Mutex for the synchronizations with @kdamond.
> + *
> + * For each monitoring context, one kernel thread for the monitoring is
> + * created. The pointer to the thread is stored in @kdamond.
> + *
> + * Once started, the monitoring thread runs until explicitly required to be
> + * terminated or every monitoring target is invalid. The validity of the
> + * targets is checked via the @target_valid callback. The termination can also
> + * be explicitly requested by writing non-zero to @kdamond_stop. The thread
> + * sets @kdamond to NULL when it terminates. Therefore, users can know whether
> + * the monitoring is ongoing or terminated by reading @kdamond. Reads and
> + * writes to @kdamond and @kdamond_stop from outside of the monitoring thread
> + * must be protected by @kdamond_lock.
> + *
> + * Note that the monitoring thread protects only @kdamond and @kdamond_stop via
> + * @kdamond_lock. Accesses to other fields must be protected by themselves.
> + *
> * @targets_list: Head of monitoring targets (&damon_target) list.
> + *
> + * @primitive: Set of monitoring primitives for given use cases.
> + * @callback: Set of callbacks for monitoring events notifications.
> */
> struct damon_ctx {
> + unsigned long sample_interval;
> + unsigned long aggr_interval;
> + unsigned long nr_regions;

IMO region is the property of the target being monitored. There might
be multiple types of target which might need the same abstraction like
damon_region and damon_addr_range but that sharing can be done at that
time.

> +
> + struct timespec64 last_aggregation;
> +
> + struct task_struct *kdamond;
> + bool kdamond_stop;
> + struct mutex kdamond_lock;
> +
> struct list_head targets_list; /* 'damon_target' objects */
> +
> + struct damon_primitive primitive;
> + struct damon_callback callback;
> };
>
> #define damon_next_region(r) \
> @@ -90,6 +212,15 @@ void damon_free_target(struct damon_target *t);
> void damon_destroy_target(struct damon_target *t);
> unsigned int damon_nr_regions(struct damon_target *t);
>
> +int damon_set_targets(struct damon_ctx *ctx,
> + unsigned long *ids, ssize_t nr_ids);
> +int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
> + unsigned long aggr_int, unsigned long nr_reg);
> +
> +int damon_nr_running_ctxs(void);
> +int damon_start(struct damon_ctx **ctxs, int nr_ctxs);
> +int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
> +
> #endif /* CONFIG_DAMON */
>
> #endif
> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index 4562b2458719..eb4ebeaa064d 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
> @@ -8,12 +8,20 @@
> #define pr_fmt(fmt) "damon: " fmt
>
> #include <linux/damon.h>
> +#include <linux/delay.h>
> +#include <linux/kthread.h>
> #include <linux/slab.h>
>
> +/* Minimal region size. Every damon_region is aligned by this. */
> +#define MIN_REGION PAGE_SIZE
> +
> /*
> * Functions and macros for DAMON data structures
> */
>
> +static DEFINE_MUTEX(damon_lock);
> +static int nr_running_ctxs;
> +
> /*
> * Construct a damon_region struct
> *
> @@ -119,3 +127,328 @@ unsigned int damon_nr_regions(struct damon_target *t)
>
> return nr_regions;
> }
> +
> +/**
> + * damon_set_targets() - Set monitoring targets.
> + * @ctx: monitoring context
> + * @ids: array of target ids
> + * @nr_ids: number of entries in @ids
> + *
> + * This function should not be called while the kdamond is running.
> + *
> + * Return: 0 on success, negative error code otherwise.
> + */
> +int damon_set_targets(struct damon_ctx *ctx,
> + unsigned long *ids, ssize_t nr_ids)
> +{
> + ssize_t i;
> + struct damon_target *t, *next;
> +
> + damon_for_each_target_safe(t, next, ctx)
> + damon_destroy_target(t);
> +
> + for (i = 0; i < nr_ids; i++) {
> + t = damon_new_target(ids[i]);
> + if (!t) {
> + pr_err("Failed to alloc damon_target\n");
> + return -ENOMEM;
> + }
> + damon_add_target(ctx, t);
> + }
> +
> + return 0;
> +}

This function looks weird and without usage. I suppose this is called
when a user requests to set up monitoring targets. I would change this
to add targets one by one. At the moment, it removes the older targets
and adds newer ones which can fail and will leave the context in a
weird state.

BTW this function should come in the patch which introduces the user interface.

> +
> +/**
> + * damon_set_attrs() - Set attributes for the monitoring.
> + * @ctx: monitoring context
> + * @sample_int: time interval between samplings
> + * @aggr_int: time interval between aggregations
> + * @nr_reg: number of regions
> + *
> + * This function should not be called while the kdamond is running.
> + * Every time interval is in micro-seconds.
> + *
> + * Return: 0 on success, negative error code otherwise.
> + */
> +int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
> + unsigned long aggr_int, unsigned long nr_reg)
> +{
> + if (nr_reg < 3) {
> + pr_err("nr_regions (%lu) must be at least 3\n",
> + nr_reg);
> + return -EINVAL;
> + }
> +
> + ctx->sample_interval = sample_int;
> + ctx->aggr_interval = aggr_int;
> + ctx->nr_regions = nr_reg;
> +
> + return 0;
> +}
> +
> +static bool damon_kdamond_running(struct damon_ctx *ctx)
> +{
> + bool running;
> +
> + mutex_lock(&ctx->kdamond_lock);
> + running = ctx->kdamond != NULL;
> + mutex_unlock(&ctx->kdamond_lock);
> +
> + return running;
> +}
> +
> +static int kdamond_fn(void *data);
> +
> +/*
> + * __damon_start() - Starts monitoring with given context.
> + * @ctx: monitoring context
> + *
> + * This function should be called while damon_lock is hold.
> + *
> + * Return: 0 on success, negative error code otherwise.
> + */
> +static int __damon_start(struct damon_ctx *ctx)
> +{
> + int err = -EBUSY;
> +
> + mutex_lock(&ctx->kdamond_lock);
> + if (!ctx->kdamond) {
> + err = 0;
> + ctx->kdamond_stop = false;
> + ctx->kdamond = kthread_create(kdamond_fn, ctx, "kdamond.%d",
> + nr_running_ctxs);
> + if (IS_ERR(ctx->kdamond))
> + err = PTR_ERR(ctx->kdamond);
> + else
> + wake_up_process(ctx->kdamond);
> + }
> + mutex_unlock(&ctx->kdamond_lock);
> +
> + return err;
> +}
> +
> +/**
> + * damon_start() - Starts the monitorings for a given group of contexts.
> + * @ctxs: an array of the pointers for contexts to start monitoring
> + * @nr_ctxs: size of @ctxs
> + *
> + * This function starts a group of monitoring threads for a group of monitoring
> + * contexts. One thread per each context is created and run in parallel. The
> + * caller should handle synchronization between the threads by itself. If a
> + * group of threads that created by other 'damon_start()' call is currently
> + * running, this function does nothing but returns -EBUSY.
> + *
> + * Return: 0 on success, negative error code otherwise.
> + */
> +int damon_start(struct damon_ctx **ctxs, int nr_ctxs)
> +{
> + int i;
> + int err = 0;
> +
> + mutex_lock(&damon_lock);
> + if (nr_running_ctxs) {

What is this checking?

> + mutex_unlock(&damon_lock);
> + return -EBUSY;
> + }
> +
> + for (i = 0; i < nr_ctxs; i++) {
> + err = __damon_start(ctxs[i]);
> + if (err)
> + break;
> + nr_running_ctxs++;
> + }
> + mutex_unlock(&damon_lock);
> +
> + return err;
> +}
> +
> +/*
> + * __damon_stop() - Stops monitoring of given context.
> + * @ctx: monitoring context
> + *
> + * Return: 0 on success, negative error code otherwise.
> + */
> +static int __damon_stop(struct damon_ctx *ctx)
> +{
> + mutex_lock(&ctx->kdamond_lock);
> + if (ctx->kdamond) {
> + ctx->kdamond_stop = true;
> + mutex_unlock(&ctx->kdamond_lock);
> + while (damon_kdamond_running(ctx))
> + usleep_range(ctx->sample_interval,
> + ctx->sample_interval * 2);
> + return 0;
> + }
> + mutex_unlock(&ctx->kdamond_lock);
> +
> + return -EPERM;
> +}
> +
> +/**
> + * damon_stop() - Stops the monitorings for a given group of contexts.
> + * @ctxs: an array of the pointers for contexts to stop monitoring
> + * @nr_ctxs: size of @ctxs
> + *
> + * Return: 0 on success, negative error code otherwise.
> + */
> +int damon_stop(struct damon_ctx **ctxs, int nr_ctxs)
> +{
> + int i, err = 0;
> +
> + for (i = 0; i < nr_ctxs; i++) {
> + /* nr_running_ctxs is decremented in kdamond_fn */
> + err = __damon_stop(ctxs[i]);
> + if (err)
> + return err;
> + }
> +
> + return err;
> +}
> +
> +/*
> + * Functions for DAMON core logics
> + */
> +
> +/*
> + * damon_check_reset_time_interval() - Check if a time interval is elapsed.
> + * @baseline: the time to check whether the interval has elapsed since
> + * @interval: the time interval (microseconds)
> + *
> + * See whether the given time interval has passed since the given baseline
> + * time. If so, it also updates the baseline to current time for next check.
> + *
> + * Return: true if the time interval has passed, or false otherwise.
> + */
> +static bool damon_check_reset_time_interval(struct timespec64 *baseline,
> + unsigned long interval)
> +{
> + struct timespec64 now;
> +
> + ktime_get_coarse_ts64(&now);
> + if ((timespec64_to_ns(&now) - timespec64_to_ns(baseline)) <
> + interval * 1000)
> + return false;
> + *baseline = now;
> + return true;
> +}
> +
> +/*
> + * Check whether it is time to flush the aggregated information
> + */
> +static bool kdamond_aggregate_interval_passed(struct damon_ctx *ctx)
> +{
> + return damon_check_reset_time_interval(&ctx->last_aggregation,
> + ctx->aggr_interval);
> +}
> +
> +/*
> + * Reset the aggregated monitoring results ('nr_accesses' of each region).
> + */
> +static void kdamond_reset_aggregated(struct damon_ctx *c)
> +{
> + struct damon_target *t;
> +
> + damon_for_each_target(t, c) {
> + struct damon_region *r;
> +
> + damon_for_each_region(r, t)
> + r->nr_accesses = 0;
> + }
> +}
> +
> +/*
> + * Check whether current monitoring should be stopped
> + *
> + * The monitoring is stopped when either the user requested to stop, or all
> + * monitoring targets are invalid.
> + *
> + * Returns true if need to stop current monitoring.
> + */
> +static bool kdamond_need_stop(struct damon_ctx *ctx)
> +{
> + struct damon_target *t;
> + bool stop;
> +
> + mutex_lock(&ctx->kdamond_lock);
> + stop = ctx->kdamond_stop;
> + mutex_unlock(&ctx->kdamond_lock);
> + if (stop)
> + return true;
> +
> + if (!ctx->primitive.target_valid)
> + return false;
> +
> + damon_for_each_target(t, ctx) {
> + if (ctx->primitive.target_valid(t))
> + return false;
> + }

Why check for valid target? Are we gonna create a new kthread when the
user set the new target?

> +
> + return true;
> +}
> +
> +static void set_kdamond_stop(struct damon_ctx *ctx, bool stop)
> +{
> + mutex_lock(&ctx->kdamond_lock);
> + ctx->kdamond_stop = stop;
> + mutex_unlock(&ctx->kdamond_lock);
> +}
> +
> +#define kdamond_call_prmt(ctx, fn) \
> + do { \
> + if (ctx->primitive.fn) \
> + ctx->primitive.fn(ctx); \
> + } while (0)
> +
> +#define kdamond_callback(ctx, fn) \
> + do { \
> + if (ctx->callback.fn && ctx->callback.fn(ctx)) \
> + set_kdamond_stop(ctx, true); \
> + } while (0)

I don't think these macros are helpful rather they make code harder to
grasp. Also don't hide the function call like set_kdamond_stop() in a
macro.

> +
> +/*
> + * The monitoring daemon that runs as a kernel thread
> + */
> +static int kdamond_fn(void *data)
> +{
> + struct damon_ctx *ctx = (struct damon_ctx *)data;
> + struct damon_target *t;
> + struct damon_region *r, *next;
> +
> + pr_info("kdamond (%d) starts\n", ctx->kdamond->pid);
> +

Call the callbacks explicitly.


> + kdamond_call_prmt(ctx, init_target_regions);
> + kdamond_callback(ctx, before_start);
> +
> + while (!kdamond_need_stop(ctx)) {
> + kdamond_call_prmt(ctx, prepare_access_checks);
> + kdamond_callback(ctx, after_sampling);
> +
> + usleep_range(ctx->sample_interval, ctx->sample_interval + 1);
> +
> + kdamond_call_prmt(ctx, check_accesses);
> +
> + if (kdamond_aggregate_interval_passed(ctx)) {
> + kdamond_callback(ctx, after_aggregation);
> + kdamond_reset_aggregated(ctx);
> + }
> + }
> + damon_for_each_target(t, ctx) {
> + damon_for_each_region_safe(r, next, t)
> + damon_destroy_region(r);
> + }
> +
> + kdamond_callback(ctx, before_terminate);
> + kdamond_call_prmt(ctx, cleanup);
> +
> + pr_debug("kdamond (%d) finishes\n", ctx->kdamond->pid);
> + mutex_lock(&ctx->kdamond_lock);
> + ctx->kdamond = NULL;
> + mutex_unlock(&ctx->kdamond_lock);
> +
> + mutex_lock(&damon_lock);
> + nr_running_ctxs--;
> + mutex_unlock(&damon_lock);
> +
> + do_exit(0);
> +}
> --
> 2.17.1
>