[PATCH] mm: madvise: fix uneven accounting of psi

From: Charan Teja Kalla
Date: Wed May 31 2023 - 07:10:43 EST


A folio turns into a Workingset during:
1) shrink_active_list() placing the folio from active to inactive list.
2) When a workingset transition is happening during the folio refault.

And when Workingset is set on a folio, PSI for memory can be accounted
during a) That folio is being reclaimed and b) Refault of that folio.

There exists clients who can do the proactive reclaim using the system
calls like madvise(), whose folios can be safely treated as inactive
folios assuming the client knows that these folios are not needed in the
near future thus wanted to reclaim them. For such folios psi is not
accounted uniformly:
a) A folio started at inactive and moved to active as part of accesses.
Workingset is absent on the folio thus madvise(MADV_PAGEOUT) don't
account such folios for PSI.

b) When the same folio transition from inactive->active and then to
inactive through shrink_active_list(). Workingset is set on the folio
thus madvise(MADV_PAGEOUT) account such folios for PSI.

c) When the same folio is part of active list directly as a result of
folio refault and this was a workingset folio prior to eviction.
Workingset is set on the folio thus madvise(MADV_PAGEOUT) account such
folios for PSI.

As said about the MADV_PAGEOUT on a folio is accounted in b) and c) but
not in a) which is inconsistent. Remove this inconsistency by always not
considering the PSI for folios that are getting reclaimed through
madvise(MADV_PAGEOUT) by clearing the Workingset on a folio. This
consistency of clearing the workingset was chosen under the assumption
that client knows these folios are not in active use thus reclaiming
them hence not eligible as workingset folios. Probably it is the same
reason why workingset is not set on a folio through MADV_COLD but during
the shrink_active_list() though both the actions make the folio put onto
the inactive list.

This patch is tested on Android, Snapdragon SoC with 8Gb RAM, 4GB swap
mounted on zram which has 2GB of backingdev. The test case involved
launching some memory hungry apps in an order and do the proactive
reclaim for the app that went to background using madvise(MADV_PAGEOUT).
We are seeing ~40% less total values of psi mem some and full when this
patch is combined with [1].

[1]https://lore.kernel.org/all/20220214214921.419687-1-hannes@xxxxxxxxxxx/T/#u

Signed-off-by: Charan Teja Kalla <quic_charante@xxxxxxxxxxx>
---
mm/madvise.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 340125d..3410c39 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -409,8 +409,10 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
if (folio_isolate_lru(folio)) {
if (folio_test_unevictable(folio))
folio_putback_lru(folio);
- else
+ else {
+ folio_clear_workingset(folio);
list_add(&folio->lru, &folio_list);
+ }
}
} else
folio_deactivate(folio);
@@ -503,8 +505,10 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
if (folio_isolate_lru(folio)) {
if (folio_test_unevictable(folio))
folio_putback_lru(folio);
- else
+ else {
+ folio_clear_workingset(folio);
list_add(&folio->lru, &folio_list);
+ }
}
} else
folio_deactivate(folio);
--
2.7.4