Re: [PATCH -next v6 0/2] Make memory reclamation measurable

From: Bixuan Cui
Date: Thu Mar 07 2024 - 02:41:11 EST




在 2024/2/21 15:44, Michal Hocko 写道:
It would be really helpful to have more details on why we need those trace points. It is my understanding that you would like to have a more fine grained numbers for the time duration of different parts of the reclaim process. I can imagine this could be useful in some cases but is it useful enough and for a wider variety of workloads? Is that worth a dedicated static tracepoints? Why an add-hoc dynamic tracepoints or BPF for a very special situation is not sufficient? In other words, tell us more about the usecases and why is this generally useful.
Thank you for your reply, I'm sorry that I forgot to describe the detailed reason.

Memory reclamation usually occurs when there is high memory pressure (or low memory) and is performed by Kswapd. In embedded systems, CPU resources are limited, and it is common for kswapd and critical processes (which typically require a large amount of memory and trigger memory reclamation) to compete for CPU resources. which in turn affects the execution of this key process, causing the execution time to increase and causing lags,such as dropped frames or slower startup times in mobile games.
Currently, with the help of kernel trace events or tools like Perfetto, we can only see that kswapd is competing for CPU and the frequency of memory reclamation triggers, but we do not have detailed information or metrics about memory reclamation, such as the duration and amount of each reclamation, or who is releasing memory (super_cache, f2fs, ext4), etc. This makes it impossible to locate the above problems.

Currently this patch helps us solve 2 actual performance problems (kswapd preempts the CPU causing game delay)
1. The increased memory allocation in the game (across different versions) has led to the degradation of kswapd.
This is found by calculating the total amount of Reclaim(page) during the game startup phase.

2. The adoption of a different file system in the new system version has resulted in a slower reclamation rate.
This is discovered through the OBJ_NAME change. For example, OBJ_NAME changes from super_cache_scan to ext4_es_scan.

Subsequently, it is also possible to calculate the memory reclamation rate to evaluate the memory performance of different versions.



The main reasons for adding static tracepoints are:
1. To subdivide the time spent in the shrinker->count_objects() and shrinker->scan_objects() functions within the do_shrink_slab function. Using BPF kprobe, we can only track the time spent in the do_shrink_slab function.
2. When tracing frequently called functions, static tracepoints (BPF tp/tracepoint) have lower performance impact compared to dynamic tracepoints (BPF kprobe).

Thanks
Bixuan Cui