Re: [PATCH] mm/damon: fix stale TLB young-state handling on arm64

From: SeongJae Park

Date: Mon May 25 2026 - 13:47:32 EST


On Mon, 25 May 2026 22:48:46 +0800 Kunwu Chan <kunwu.chan@xxxxxxxxx> wrote:

> From: Kunwu Chan <chentao@xxxxxxxxxx>

Thank you for this great patch!

>
> damon_ptep_mkold() clears the PTE Access Flag so that a later
> access will set it again and damon_folio_young() can observe it
> via pte_young().
>
> On arm64, however, ptep_test_and_clear_young() clears AF in the
> page tables without invalidating the corresponding TLB entry.
> Subsequent accesses can therefore continue hitting a stale TLB
> entry without a page table walk. The PTE AF bit stays clear,
> pte_young() reports false, and DAMON treats the region as
> unaccessed.
>
> folio_set_idle() does not help here. It updates only software
> state, and accesses through a stale TLB entry do not clear the
> idle flag.
>
> As a result, nr_accesses stays low regardless of the real access
> pattern. DAMOS schemes fail to match, WSS estimation reports
> zero, and actions like pageout never trigger.

You are correct. Nonetheless, we intentionally designed DAMON in this way, to
avoid the performance overhead from the added TLB flushes. Meanwhile, we
believed this is ok in terms of the monitoring results accuracy on real world
production environment, becasue such environments would have large amount of
working set that flushes TLB buffers anyway. We decided to take this way after
a measurement [1].

And apparently you found this behavior as problematic on a test environment
that having small size of working set. We found a similar issue in DAMON
selftest, and we updated the test [2] to simulate the expected real world
production environments, rather than changing DAMON.

But, this kind of question is recurring. In addition to the previous
discussion, there were a few private inqueries for this issue. And though the
real world production environment is the priamry target of DAMON, I understand
it is better to support testing environment, too. So, I think it is better to
make some changes for this issue, if it doesn't make other problems.

>
> Fix this by switching to ptep_clear_flush_young() and
> pmdp_clear_flush_young().
>
> On arm64 these perform the required TLB invalidation after
> clearing AF. The invalidation is deferred, but still sufficient
> for DAMON's sampling granularity.
>
> On x86, ptep_clear_flush_young() is equivalent to
> ptep_test_and_clear_young() for base pages, so there is no
> behavioral change. pmdp_clear_flush_young() additionally performs
> a flush at PMD level, matching the existing x86 implementation.
>
> On powerpc, riscv, and s390, the clear_flush variants currently
> map back to test_and_clear implementations, so this patch does not
> change their behavior.

This change seems much nicer and might be more optimized than my simple
implementation of tlb flush [1] that I tested before.

>
> Reproduced on arm64 (128 CPUs, 7.1.0-rc4):
>
> before:
> WSS estimation: 50th percentile error 100% (reported as zero)
> apply_interval: schemes never tried
>
> after:
> WSS estimation: 50th percentile error 0.08%
> apply_interval: passes

And nice test results. I guess you are referring to the tests in damon-tests?
Clarifying the context would be nice.

Also, have you had a chance to measure the performance impact?

So, I'd like to have this change. But, unless we have very clear evidence
showing this change is not increasing the performance overhead, I'd prefer
making this as an optional feature.

For the user interface, we could add a new sysfs file for the option, say,
'flush_sample_tlb' under 'monitoring_attrs' directory.

For long term, I'm planning [1] to extend the data attributes monitoring
feature so that data access becomes just one of the attributes. Once it is
done, we could control this tlb flush option using the probes interface.

I was initially thinking about asking Kunwu to wait until the data attributes
monitoring extension is done, and add this tlb flush option on top of that.
Because, otherwise, we may need to deprecate 'flush_sample_tlb' after the
extension is done. But, we will anyway need to deprecate a few interfaces
including 'nr_accesses'. Doing the deprecation of 'flush_sample_tlb' together
with it shouldn't be huge amount of overhead.

So, unless Kunwu and Lian has other concerns, I'd suggest the
'flush_sample_tlb' path.

[...]

[1] https://lore.kernel.org/20200403103059.12762-1-sjpark@xxxxxxxxxx/
[2] https://lore.kernel.org/20260117020731.226785-3-sj@xxxxxxxxxx/


Thanks,
SJ