Re: [PATCH] mm/vmscan: add balance_pgdat begin/end tracepoints

From: SUVONOV BUNYOD

Date: Thu Apr 23 2026 - 20:46:39 EST


Thank you for reviewing Shakeel,

> Do we need to trace highest_zoneidx at the end? Can it change within
> balance_pgdat()?

highest_zoneidx does not change within a balance_pgdat() invocation. It
is passed in as an argument and remains the classzone bound used for the
balancing checks throughout the function.

I kept highest_zoneidx in the end tracepoint to make the outcome event
self-contained. In principle, begin/end correlation is possible, but
under sustained memory pressure kswapd reclaim can be frequent enough
that consumers may prefer to analyze end events directly, and any
dependence on matching begin/end becomes less convenient and less robust
in the presence of filtering or dropped trace records.

Since nr_reclaimed and the final order are only known at the end, having
highest_zoneidx there allows end-only analysis without correlating with
the begin event.

For example, it lets users answer questions like:
- this pass reclaimed too much or too little memory; what highest_zoneidx
did that result correspond to?
- how much reclaim was done when balancing up to ZONE_NORMAL vs other
classzone bounds?
- when highest_zoneidx == ZONE_NORMAL, how often did reclaim finish at
order=0?

So it is there because it provides context for the end-of-reclaim result.
Do you think this is sufficient justification? If not, then I can drop it
from the end tracepoint in v2.

----- Original Message -----
From: "Shakeel Butt" <shakeel.butt@xxxxxxxxx>
To: "Bunyod Suvonov" <b.suvonov@xxxxxxxxxxx>
Cc: akpm@xxxxxxxxxxxxxxxxxxxx, hannes@xxxxxxxxxxx, rostedt@xxxxxxxxxxx, mhiramat@xxxxxxxxxx, david@xxxxxxxxxx, mhocko@xxxxxxxxxx, "zhengqi arch" <zhengqi.arch@xxxxxxxxxxxxx>, ljs@xxxxxxxxxx, "mathieu desnoyers" <mathieu.desnoyers@xxxxxxxxxxxx>, linux-mm@xxxxxxxxx, linux-trace-kernel@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
Sent: Friday, April 24, 2026 1:46:55 AM
Subject: Re: [PATCH] mm/vmscan: add balance_pgdat begin/end tracepoints

On Thu, Apr 23, 2026 at 06:37:53PM +0800, Bunyod Suvonov wrote:
> Vmscan has six main reclaim entry points: try_to_free_pages() for
> direct reclaim, try_to_free_mem_cgroup_pages() for memcg reclaim,
> mem_cgroup_shrink_node() for memcg soft limit reclaim, node_reclaim()
> for node reclaim, shrink_all_memory() for hibernation reclaim, and
> balance_pgdat() for kswapd reclaim.
>
> All of them, except for shrink_all_memory() and balance_pgdat(), already
> have begin/end tracepoints. This makes it harder to trace which reclaim
> path is responsible for memory reclaim activity, because kswapd reclaim
> cannot be identified as cleanly as other reclaim entry points, even
> though it is the main background reclaim path under memory pressure.
> There may be no need to trace shrink_all_memory() as it is primarily
> used during hibernation. So this patch adds the missing tracepoint pair
> for balance_pgdat().
>
> The begin tracepoint records the node id, requested reclaim order, and
> highest_zoneidx. The end tracepoint records the node id, reclaim order
> that balance_pgdat() finished with, highest_zoneidx, and nr_reclaimed.

Do we need to trace highest_zoneidx at the end? Can it change within
balance_pgdat()?

> Together, they show the requested reclaim order and zone bound, whether
> reclaim fell back to a lower order, and how much reclaim work was done.
>
> Signed-off-by: Bunyod Suvonov <b.suvonov@xxxxxxxxxxx>

Overall looks good.