[PATCH 1/5] Docs/mm/damon/design: add monitoring parameters tuning guide

From: SeongJae Park
Date: Fri Jan 10 2025 - 14:01:26 EST


DAMON monitoring parameters including sampling and aggregation intervals
should be tuned for given workloads. However, the fact is not
explicitly documented. Also there is no official guide to help the
tuning. This apparently confused a number of people[1] at best, or made
people forgive DAMON without tuning. Add a guide on the design
document.

[1] https://lore.kernel.org/20241202175459.2005526-1-sj@xxxxxxxxxx

Signed-off-by: SeongJae Park <sj@xxxxxxxxxx>
---
Documentation/mm/damon/design.rst | 48 +++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)

diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst
index 667775bab86c..dd7e0f63a69a 100644
--- a/Documentation/mm/damon/design.rst
+++ b/Documentation/mm/damon/design.rst
@@ -203,6 +203,8 @@ This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed.


+.. _damon_design_adaptive_regions_adjustment:
+
Adaptive Regions Adjustment
~~~~~~~~~~~~~~~~~~~~~~~~~~~

@@ -264,6 +266,52 @@ tracepoints. For more details, please refer to the documentations for
respectively.


+Monitoring Parameters Tuning Guide
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In short, set ``aggregation interval`` to capture meaningful amount of accesses
+for the purpose. The amount of accesses can be measured using ``nr_accesses``
+and ``age`` of regions in the aggregated monitoring results snapshot. The
+default value of the interval, ``100ms``, turns out to be too short in many
+cases. Set ``sampling interval`` proportional to ``aggregation interval``. By
+default, ``1/20`` is recommended as the ratio.
+
+``Aggregation interval`` should be set as the time interval that the workload
+can make an amount of accesses for the monitoring purpose, within the interval.
+If the interval is too short, only small number of accesses are captured. As a
+result, the monitoring results look everything is samely accessed only rarely.
+For many purposes, that would be useless. If it is too long, however, the time
+to converge regions with the :ref:`regions adjustment mechanism
+<damon_design_adaptive_regions_adjustment>` can be too long, depending on the
+time scale of the given purpose. This could happen if the workload is actually
+making only rare accesses but the user thinks the amount of accesses for the
+monitoring purpose too high. For such cases, the target amount of access to
+capture per ``aggregation interval`` should carefully reconsidered. Also, note
+that the captured amount of accesses is represented with not only
+``nr_accesses``, but also ``age``. For example, even if every region on the
+monitoring results show zero ``nr_accesses``, regions could still be
+distinguished using ``age`` values as the recency information.
+
+Hence the optimum value of ``aggregation interval`` depends on the access
+intensiveness of the workload. The user should tune the interval based on the
+amount of access that captured on each aggregated snapshot of the monitoring
+results.
+
+Note that the default value of the interval is 100 milliseconds, which is too
+short in many cases, especially on large systems.
+
+``Sampling interval`` defines the resolution of each aggregation. If it is set
+too large, monitoring results will look like every region was samely rarely
+accessed, or samely frequently accessed. That is, regions become
+undistinguishable based on access pattern, and therefore the results will be
+useless in many use cases. If ``sampling interval`` is too small, it will not
+degrade the resolution, but will increase the monitoring overhead. If it is
+appropriate enough to provide a resolution of the monitoring results that
+sufficient for the given purpose, it shouldn't be unnecessarily further
+lowered. It is recommended to be set proportional to ``aggregation interval``.
+By default, the ratio is set as ``1/20``, and it is still recommended.
+
+
.. _damon_design_damos:

Operation Schemes
--
2.39.5