[RFC PATCH 2/2] zsmalloc: chain-length configuration should consider other metrics
From: Sergey Senozhatsky
Date: Wed Dec 31 2025 - 20:39:12 EST
This is the first step towards re-thinking optimization strategy
during chain-size (the number of 0-order physical pages a zspage
chains for most optimal performance) configuration. Currently,
we only consider one metric - "wasted" memory - and try various
chain length configurations in order to find the minimal wasted
space configuration. However, this strategy doesn't consider
the fact that our optimization space is not single-dimensional.
When we increase zspage chain length we at the same increase the
number of spanning objects (objects that span two physical pages).
Such objects slow down read() operations because zsmalloc needs to
kmap both pages and memcpy objects' chunks. This clearly increases
CPU usage and battery drain.
We, most likely, need to consider numerous metrics and optimize
in a multi-dimensional space. These can be wired in later on, for
now we just add some heuristic to increase zspage chain length only
if there are substantial savings memory usage wise. We can tune
these threshold values (there is a simple user-space tool [2] to
experiment with those knobs), but what we currently is already
interesting enough. Where does this bring us, using a synthetic
test [1], which produces byte-to-byte comparable workloads, on a
4K PAGE_SIZE, chain size 10 system:
BASE
====
zsmalloc_test: num write objects: 339598
zsmalloc_test: pool pages used 175111, total allocated size 698213488
zsmalloc_test: pool memory utilization: 97.3
zsmalloc_test: num read objects: 339598
zsmalloc_test: spanning objects: 110377, total memcpy size: 278318624
PATCHED
=======
zsmalloc_test: num write objects: 339598
zsmalloc_test: pool pages used 175920, total allocated size 698213488
zsmalloc_test: pool memory utilization: 96.8
zsmalloc_test: num read objects: 339598
zsmalloc_test: spanning objects: 103256, total memcpy size: 265378608
At a price of 0.5% increased pool memory usage there was a 6.5%
reduction in a number of spanning objects (4.6% less copied bytes).
Note, the results are specific to this particular test case. The
savings are not uniformly distributed: according to [2] for some
size classes the reduction in the number of spanning objects
per-zspage goes down from 7 to 0 (e.g. size class 368), for other
from 4 to 2 (e.g. size class 640). So the actual memcpy savings
are data-pattern dependent, as always.
[1] https://github.com/sergey-senozhatsky/simulate-zsmalloc/blob/main/0001-zsmalloc-add-zsmalloc_test-module.patch
[2] https://github.com/sergey-senozhatsky/simulate-zsmalloc/blob/main/simulate_zsmalloc.c
Signed-off-by: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx>
---
mm/zsmalloc.c | 39 +++++++++++++++++++++++++++++++--------
1 file changed, 31 insertions(+), 8 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5e7501d36161..929db7cf6c19 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -2000,22 +2000,45 @@ static int zs_register_shrinker(struct zs_pool *pool)
static int calculate_zspage_chain_size(int class_size)
{
int i, min_waste = INT_MAX;
- int chain_size = 1;
+ int best_chain_size = 1;
if (is_power_of_2(class_size))
- return chain_size;
+ return best_chain_size;
for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
- int waste;
+ int curr_waste = (i * PAGE_SIZE) % class_size;
- waste = (i * PAGE_SIZE) % class_size;
- if (waste < min_waste) {
- min_waste = waste;
- chain_size = i;
+ if (curr_waste == 0)
+ return i;
+
+ /*
+ * Accept the new chain size if:
+ * 1. The current best is wasteful (> 10% of zspage size),
+ * accept anything that is better.
+ * 2. The current best is efficient, accept only significant
+ * (25%) improvement.
+ */
+ if (min_waste * 10 > best_chain_size * PAGE_SIZE) {
+ if (curr_waste < min_waste) {
+ min_waste = curr_waste;
+ best_chain_size = i;
+ }
+ } else {
+ if (curr_waste * 4 < min_waste * 3) {
+ min_waste = curr_waste;
+ best_chain_size = i;
+ }
}
+
+ /*
+ * If the current best chain has low waste (approx < 1.5%
+ * relative to zspage size) then accept it right away.
+ */
+ if (min_waste * 64 <= best_chain_size * PAGE_SIZE)
+ break;
}
- return chain_size;
+ return best_chain_size;
}
/**
--
2.52.0.351.gbe84eed79e-goog