Re: [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk

From: SeongJae Park

Date: Mon May 18 2026 - 21:28:56 EST


On Sun, 17 May 2026 22:54:18 -0700 Ravi Jonnalagadda <ravis.opensrc@xxxxxxxxx> wrote:

> On Sun, May 17, 2026 at 4:43 PM SeongJae Park <sj@xxxxxxxxxx> wrote:
> >
> > On Sat, 16 May 2026 14:03:57 -0700 Ravi Jonnalagadda <ravis.opensrc@xxxxxxxxx> wrote:
> >
> > > On populated physical address ranges the pageblock skip optimization
> > > alone is insufficient — most pageblocks contain at least one allocated
> > > page, so the walk still iterates millions of PFNs.
> >
> > So my questions to the fourth patch of this series are also applied here,
> > especially about the assumption of systems having most memory free. I will
> > hold digging deep here until the high level discussion is completed.
> >
> Hello SJ,
>
> Stepping back to look at this with fresh eyes, I think this
> patch is in the same bucket as patches 1 and 3 (full background
> on the patch 3 thread): it came out of the same parallel debug
> effort, where I was seeing long walks during the startup
> transient on a multi-hundred-GB monitored target -- before
> kdamond_split_regions() and damon_apply_min_nr_regions() had
> trimmed the initial regions down -- and was unsure whether
> those long walks were contributing to the NMI-side
> responsiveness issues I was chasing.
>
> Once the actual NMI problem was fixed and the per-region work
> in steady state is bounded by DAMON's region splitting (and by
> the scheme's quota when one is set), the per-call cost in
> damon_pa_migrate() is already small enough that the budget
> isn't doing useful work. cond_resched() after damon_migrate_pages()
> covers the preemption case.
>
> If a real workload later shows a per-region walk long
> enough to matter, I'll re-evaluate then with concrete numbers.

Sounds good!

FYI, many parts of DAMON are designed assuming it will be used on production
environments that have long-running workload and prefer stability. It helps
making good results in long run, but also make it difficult to understand it in
short term, especially on lab environments.

I learned that by grateful users including you, and therefore recently
developed the multiple quota tuning logics and failed regions charge ratio. I
feel like such DAMON limitation has contributed to this case to confuse you.
Sorry if that was the case, and please feel free to share your pain points and
improvement ideas. Every user's use case including yours does matter!


Thanks,
SJ

[...]