Re: [RFC PATCH v4 4/4] mm/damon: add PA-mode cache for eligible memory detection lag

From: SeongJae Park

Date: Tue Feb 24 2026 - 00:55:02 EST


On Mon, 23 Feb 2026 12:32:32 +0000 Ravi Jonnalagadda <ravis.opensrc@xxxxxxxxx> wrote:

> In PA-mode, DAMON needs time to re-detect hot memory at new physical
> addresses after migration. This causes the goal metrics to temporarily
> show incorrect values until detection catches up.

I agree this can happen, and could be problematic on some setup.

>
> Add an eligible cache mechanism to compensate for this detection lag:
>
> - Track migration deltas per node using a rolling window that
> automatically expires old data
> - Use direction-aware adjustment: for target nodes (receiving memory),
> use max(detected, predicted) to ensure migrated memory is counted
> even before detection catches up; for source nodes (losing memory),
> use predicted values when detection shows unreliable low values
> - Maintain the zero-sum property across nodes to preserve total
> eligible memory
> - Include cooldown mechanism to keep cache active while detection
> stabilizes after migration stops
> - Add time-based expiry to clear stale cache data when no migration
> occurs for a configured period
>
> The cache uses max_eligible tracking to handle detection oscillation,
> prioritizing peak observed values over potentially stale snapshots.
> A threshold check prevents quota oscillation when detection swings
> between zero and small values.

But, I feel this might be too overfit solution for a specific setup.

>
> Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@xxxxxxxxx>
> ---
> include/linux/damon.h | 45 +++++
> mm/damon/core.c | 421 +++++++++++++++++++++++++++++++++++----
> mm/damon/sysfs-schemes.c | 30 +++
> 3 files changed, 460 insertions(+), 36 deletions(-)

The size of the change is quite big. I'm now curious if the problem is
significant enough for this size of change, and if this solution is only the
single and the best one.

First of all, I'm curious if the problem is that significant. I assume you may
seen the issue from your test setup that you shared with the cover letter.
>From my understanding of the cover letter of this patch series, however, you
are testing this on a setup having two complementary schemes. And you use
TEMPORAL tuner. The motivation of TEMPORAL tuner was for setup that not having
a factor to move the quota goal value without additional intervention. In
complementary schemes setup, the schemes becomes such factors for each other.
In the case, TEMPORAL tuner might be worse in terms of the size of temporal
oscillations. I don't know details of your test setup, but I suspect the use
of TEMPORAL tuner might made the issue bigger than real.

I also assume the real world people may use DAMON with auto-tuning mostly
because they don't know the access pattern of the system and assume it will be
dynamic. In the case, even if we perfectly solve the issue, some of
oscillation will happen. So, I think the issue in the real world might be
smaller than that we can find on some specific test setups.

Meanwhile, the node_[in]eligible_mem_bp concept makes sense to me. I'm worried
if this patch is unnecessarily delaying the progress of the main change.

So, unless we have clear evidence of the significance of this issue, I'd prefer
dropping this for now. After that, if the issue turns out to be significant or
this solution is proven to be significantly beneficial, from your next more
realistic test setup, or from real world usage after upstreaming of the main
change, we can revisit. What do you think?


Thanks,
SJ

[...]