[RFC v8 8/8] Documentation/admin-guide/mm: Document DAMON-based operation schemes

From: SeongJae Park
Date: Tue May 12 2020 - 07:58:09 EST


From: SeongJae Park <sjpark@xxxxxxxxx>

This commit documents DAMON-based operation schemes in the DAMON
document.

Signed-off-by: SeongJae Park <sjpark@xxxxxxxxx>
---
.../admin-guide/mm/data_access_monitor.rst | 100 +++++++++++++++++-
1 file changed, 98 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/mm/data_access_monitor.rst b/Documentation/admin-guide/mm/data_access_monitor.rst
index 915956aa1065..d4a48bc63400 100644
--- a/Documentation/admin-guide/mm/data_access_monitor.rst
+++ b/Documentation/admin-guide/mm/data_access_monitor.rst
@@ -182,8 +182,8 @@ only for each of a user-specified time interval (``regions update interval``).
``debugfs`` Interface
=====================

-DAMON exports four files, ``attrs``, ``pids``, ``record``, and ``monitor_on``
-under its debugfs directory, ``<debugfs>/damon/``.
+DAMON exports five files, ``attrs``, ``pids``, ``record``, ``schemes`` and
+``monitor_on`` under its debugfs directory, ``<debugfs>/damon/``.

Attributes
----------
@@ -227,6 +227,46 @@ be 4 KiB and the result to be saved in ``/damon.data``::
# cat record
4096 /damon.data

+Schemes
+-------
+
+For usual DAMON-based data access awared memory management optimizations, users
+would simply want the system to apply a memory management action to a memory
+region of a specific size having a specific access frequency for a specific
+time. DAMON receives such formalized operation schemes from user and applies
+those to the target processes.
+
+Users can get and set the schemes by reading from and writing to ``schemes``
+debugfs file. To the file, each of the schemes should represented in each line
+in below form:
+
+ min-size max-size min-acc max-acc min-age max-age action
+
+Bytes for size of regions (``min-size`` and ``max-size``), number of monitored
+accesses per aggregate interval for access frequency (``min-acc`` and
+``max-acc``), number of aggregate intervals for age of regions (``min-age`` and
+``max-age``), and predefined integer for memory management actions should be
+used. ``madvise()`` system call with specific hint are currently available.
+The numbers and their representing memory hint are as below::
+
+ 0 MADV_WILLNEED
+ 1 MADV_COLD
+ 2 MADV_PAGEOUT
+ 3 MADV_HUGEPAGE
+ 4 MADV_NOHUGEPAGE
+
+You can disable schemes by simply writing empty string to the file. For
+example, below commands applies a scheme saying âIf a memory region larger than
+4 KiB (4096 0) is showing less than 5 accesses per aggregate interval (0 5) for
+more than 5 aggregate interval (5 0), page out the region (2)â, check the
+entered scheme again, and finally remove the scheme.::
+
+ # cd <debugfs>/damon
+ # echo "4096 0 0 5 5 0 2" > schemes
+ # cat schemes
+ 4096 0 0 5 5 0 2
+ # echo > schemes
+
Turning On/Off
--------------

@@ -426,3 +466,59 @@ made.

Users can specify the resolution of the distribution (``--range``). It also
supports 'gnuplot' based simple visualization (``--plot``) of the distribution.
+
+
+DAMON-based Operation Schemes
+-----------------------------
+
+The ``schemes`` subcommand applies given data access pattern based operation
+schemes to the given target processes. The target processes are described
+using the command to spawn the processes or pid of running processes, as
+similar to that of ``record`` subcommand. Meanwhile, the operation schemes
+should be saved in a text file using below format and passed to ``schemes``
+subcommand via ``--schemes`` option.
+
+ min-size max-size min-acc max-acc min-age max-age action
+
+The format also supports comments, several units for size and age of regions,
+and human readable action names. Currently supported operation actions are
+WILLNEED, COLD, PAGEOUT, HUGEPAGE, and NOHUGEPAGE. Each of the actions works
+as same to that of madvise() system call. Below is an example schemes file.
+Please also note that 0 for max values means infinite.::
+
+ # format is:
+ # <min/max size> <min/max frequency (0-99)> <min/max age> <action>
+ #
+ # B/K/M/G/T for Bytes/KiB/MiB/GiB/TiB
+ # us/ms/s/m/h/d for micro-seconds/milli-seconds/seconds/minutes/hours/days
+ # 'null' means zero for size and age.
+
+ # if a region keeps a high access frequency for more than 100ms, put the
+ # region on the head of the LRU list (call madvise() with MADV_WILLNEED).
+ null null 80 null 100ms 0s willneed
+
+ # if a region keeps a low access frequency for more than 200ms and less
+ # than one hour, put the # region on the tail of the LRU list (call
+ # madvise() with MADV_COLD).
+ 0B 0B 10 20 200ms 1h cold
+
+ # if a region keeps a very low access frequency for more than 1 minute,
+ # swap out the region immediately (call madvise() with MADV_PAGEOUT).
+ 0B null 0 10 60s 0s pageout
+
+ # if a region of a size bigger than 2MiB keeps a very high access frequency
+ # for more than 100ms, let the region to use huge pages (call madvise()
+ # with MADV_HUGEPAGE).
+ 2M null 90 99 100ms 0s hugepage
+
+ # If a regions of a size bigger than 2MiB keeps small access frequency for
+ # more than 100ms, avoid the region using huge pages (call madvise() with
+ # MADV_NOHUGEPAGE).
+ 2M null 0 25 100ms 0s nohugepage
+
+For example, you can make a running process named 'foo' to use THP for memory
+regions keeping 2MB or larger size and having very high access frequency for
+more than 100 milliseconds using below commands::
+
+ $ echo "2M null 90 99 100ms 0s hugepage" > my_thp_scheme
+ $ ./damo schemes --schemes my_thp_scheme `pidof foo`
--
2.17.1