Re: [PATCH 1/2] mm/damon/core: optimize kdamond_apply_schemes() by inverting scheme and region loops
From: Josh Law
Date: Sun Mar 22 2026 - 17:47:26 EST
On 22 March 2026 21:44:18 GMT, SeongJae Park <sj@xxxxxxxxxx> wrote:
>Hello Josh,
>
>On Sun, 22 Mar 2026 18:46:40 +0000 Josh Law <objecting@xxxxxxxxxxxxx> wrote:
>
>> Currently, kdamond_apply_schemes() iterates over all targets, then over all
>> regions, and finally calls damon_do_apply_schemes() which iterates over
>> all schemes. This nested structure causes scheme-level invariants (such as
>> time intervals, activation status, and quota limits) to be evaluated inside
>> the innermost loop for every single region.
>>
>> If a scheme is inactive, has not reached its apply interval, or has already
>> fulfilled its quota (quota->charged_sz >= quota->esz), the kernel still
>> needlessly iterates through thousands of regions only to repeatedly
>> evaluate these same scheme-level conditions and continue.
>>
>> This patch inlines damon_do_apply_schemes() into kdamond_apply_schemes()
>> and inverts the loop ordering. It now iterates over schemes on the outside,
>> and targets/regions on the inside.
>>
>> This allows the code to evaluate scheme-level limits once per scheme.
>> If a scheme's quota is met or it is inactive, we completely bypass the
>> O(Targets * Regions) inner loop for that scheme. This drastically reduces
>> unnecessary branching, cache thrashing, and CPU overhead in the kdamond
>> hot path.
>
>That makes sense in high level. But, this will make a kind of behavioral
>difference that could be user-visible. I am failing at finding a clear use
>case that really depends on the old behavior. But, still it feels like not a
>small change to me.
>
>So, I'd like to be conservative to this change, unless there are good evidences
>showing very clear and impactful real world benefits. Can you share such
>evidences if you have?
>
>
>Thanks,
>SJ
>
>[...]
Hello,
Here are some benchmark results for both patches indicating the need
Benchmarking Patch 2 (Division elimination) with 1024 different attrs...
Old (with division): 0.376372 s
New (cached value): 0.265899 s
Speedup: 1.42x
Benchmarking Patch 1 (Loop inversion) with 10 schemes, 5 targets, 2000 regions...
Old (nested regions): 0.055627 s
New (inverted schemes): 0.016167 s
Speedup: 3.44x