Re: [PATCH v34 00/13] Introduce Data Access MONitor (DAMON)

From: Shakeel Butt
Date: Tue Jul 27 2021 - 17:30:56 EST

(reduced CC list)

Hi all,

I have been asked to comment if Google is interested in using this
feature, its general usefulness and if it is sufficiently general and
non-duplicative. I will try to answer these but first I will explain
the use-cases we are particularly interested in and for which we want
a general access monitoring mechanism.

At the moment Google is particularly interested in four use-cases:

1) Working set estimation: This is used for cluster level scheduling
and controlling the knobs of memory overcommit.

2) Proactive reclaim

3) Balancing between memory tiers: Moving hot pages to fast tiers and
cold pages to slow tiers

4) Hugepage optimization: Hot memory backed by hugepages

In addition, these uses are not happening in isolation. We want a
combination of these running concurrently on a system. So, it is clear
that the first version or step of DAMON which only targets virtual
address space monitoring is not sufficient for these use-cases.

I think the more important question is if DAMON can be extended to
system level monitoring to fulfill these use-cases. Address space
monitoring is a core concept in DAMON and it has implemented address
space based optimizations (i.e. dividing address space into regions,
assuming locality within regions, random sampling within regions
instead of looking at each page and dynamically adjusting regions).
There is a followup proposal on monitoring physical address space in
DAMON. However for systems running multiple workloads, the address
space optimizations core to DAMON would be ineffective.

There are discussions/brainstorming on supporting abstract address
space based on LRUs which is somewhat similar to Multigen LRU [1]
proposal but not well articulated yet. BTW Multigen LRU [1] is another
similar proposal but targets one specific use-case i.e. memory reclaim
(proactive reclaim). Anyways I think we need more brainstorming for a
generalized solution of system level access monitoring.

Regarding merging DAMON, I personally think there are users who might
be interested in only their virtual address space and DAMON is
providing a solution for such users. SeongJae can provide more details
or knowledge if any big user other than Amazon is interested in the
feature. DAMON does not expose stable APIs at the moment, so these can
be changed later if needed. I think it is ok to merge DAMON for some
exposure. However I do want to make this clear that the solution space
is not complete. The solution of system level monitoring is still
needed which can be a future extension to DAMON or more generalized
Multigen LRU.