On Mon, Apr 07, 2025 at 12:23:16PM -0400, Waiman Long <longman@xxxxxxxxxx> wrote:Yes, the variation I saw was on the same system with multiple runs. The memory.current values are read by the time the parent cgroup memory usage reaches near the target 50M, but how much memory are remaining in each child varies from run-to-run. You can say that it is somewhat chaotic.
Child Actual usage Expected usage %errI like Roman's suggestion of nr_cpus dependence but I assume your
----- ------------ -------------- ----
1 16990208 22020096 -12.9%
1 17252352 22020096 -12.1%
0 37699584 30408704 +10.7%
1 14368768 22020096 -21.0%
1 16871424 22020096 -13.2%
The current 10% error tolerenace might be right at the time
test_memcontrol.c was first introduced in v4.18 kernel, but memory
reclaim have certainly evolved quite a bit since then which may result
in a bit more run-to-run variation than previously expected.
variations were still on the same system, weren't they?
Is it fair to say that reclaim is chaotic [1]? I wonder what may cause
variations between separate runs of the test.
Would it help to `echo 3 >drop_caches` before each run to have more
stable initial conditions? (Not sure if it's OK in selftests.)
<del>Or sleep 0.5s to settle rstat flushing?</del> No, page_counter's
don't suffer that but stock MEMCG_CHARGE_BATCH in percpu stocks.
So maybe drain the stock so that counters are precise after the test?
(Either by executing a dummy memcg on each CPU or via some debugging
API.)