Re: [PATCH 0/8] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting
From: Joshua Hahn
Date: Tue Mar 03 2026 - 12:52:09 EST
On Mon, 2 Mar 2026 13:31:32 -0800 Nhat Pham <nphamcs@xxxxxxxxx> wrote:
> On Thu, Feb 26, 2026 at 11:29 AM Joshua Hahn <joshua.hahnjy@xxxxxxxxx> wrote:
[...snip...]
> > Introduce a new per-zpdesc array of objcg pointers to track
> > per-memcg-lruvec memory usage by zswap, while leaving zram users
> > unaffected.
[...snip...]
Hi Nhat! I hope you are doing well : -) Thank you for taking a look!
> I might have missed it and this might be in one of the latter patches,
> but could also add some quick and dirty benchmark for zswap to ensure
> there's no or minimal performance implications? IIUC there is a small
> amount of extra overhead in certain steps, because we have to go
> through zsmalloc to query objcg. Usemem or kernel build should suffice
> IMHO.
Yup, this was one of my concerns too. I tried to do a somewhat comprehensive
analysis below, hopefully this can show a good picture of what's happening.
Spoilers: there doesn't seem to be any significant regressions (< 1%)
and any regressions are within a small fraction of the standard deviation.
One thing that I have noticed is that there is a tangible reduction in
standard deviation for some of these benchmarks. I can't exactly pinpoint
why this is happening, but I'll take it as a win :p
> To be clear, I don't anticipate any observable performance change, but
> it's a good sanity check :) Besides, can't be too careful with stress
> testing stuff :P
For sure. I should have done these and included it in the original RFC,
but I think I might have been too eager to get the RFC out : -)
Will include in the second version of the series!
All the experiments below are done on a 2-NUMA system. The data is quite
compressible, which I think makes sense for measuring the overhead of accounting.
Benchmark 1
Allocating 2G memory to one node with 1G memory.high. Average across 10 trials
+-------------------------+---------+----------+
| | average | stddev |
+-------------------------+---------+----------+
| Baseline (11439c4635ed) | 8887.82 | 362.40 |
| Baseline + Series | 8944.16 | 356.45 |
+-------------------------+---------+----------+
| Delta | +0.634% | -1.642% |
+-------------------------+---------+----------+
Benchmark 2
Allocating 2G memory to one node with 1G memory.high, churn 5x through the
memory. Average across 5 trials.
+-------------------------+----------+----------+
| | average | stddev |
+-------------------------+----------+----------+
| Baseline (11439c4635ed) | 31152.96 | 166.23 |
| Baseline + Series | 31355.28 | 64.86 |
+-------------------------+----------+----------+
| Delta | +0.649% | -60.981% |
+-------------------------+----------+----------+
Benchmark 3
Allocating 2G memory to one node with 1G memory.high, split across 2 nodes.
Average across 5 trials.
+-------------------------+---------+----------+
| a | average | stddev |
+-------------------------+---------+----------+
| Baseline (11439c4635ed) | 16101.6 | 174.18 |
| Baseline + Series | 16022.4 | 117.17 |
+-------------------------+---------+----------+
| Delta | -0.492% | -32.731% |
+-------------------------+---------+----------+
Benchmark 4
Reading stat files 10000 times under memory pressure
memory.stat
+-------------------------+---------+----------+
| | average | stddev |
+-------------------------+---------+----------+
| Baseline (11439c4635ed) | 24524.4 | 501.7 |
| Baseline + Series | 24807.2 | 444.53 |
+-------------------------+---------+---------+
| Delta | 1.153% | -11.395% |
+-------------------------+---------+----------+
memory.numa_stat
+-------------------------+---------+---------+
| | average | stddev |
+-------------------------+---------+---------+
| Baseline (11439c4635ed) | 24807.2 | 444.53 |
| Baseline + Series | 23837.6 | 521.68 |
+-------------------------+---------+---------+
| Delta | -3.905% | 17.355% |
+-------------------------+---------+---------+
proc/vmstat
+-------------------------+---------+----------+
| | average | stddev |
+-------------------------+---------+----------+
| Baseline (11439c4635ed) | 24793.6 | 285.26 |
| Baseline + Series | 23815.6 | 553.44 |
+-------------------------+---------+---------+
| Delta | -3.945% | +94.012% |
+-------------------------+---------+----------+
^^^ Some big increase in standard deviation here, although there is some
decrease in the average time. Probably the most notable change that I've seen
from this patch.
node0/vmstat
+-------------------------+---------+----------+
| a | average | stddev |
+-------------------------+---------+----------+
| Baseline (11439c4635ed) | 24541.4 | 281.41 |
| Baseline + Series | 24479 | 241.29 |
+-------------------------+---------+---------+
| Delta | -0.254% | -14.257% |
+-------------------------+---------+----------+
Lots of testing results, I think mostly negligible in terms of average, but
some non-negligible changes in standard deviation going in both directions.
I don't see anything too concerning off the top of my head, but for the
next version I'll try to do some more testing across different machines
as well (I don't have any machines with > 2 nodes, but maybe I can do
some tests on QEMU just to sanity check)
Thanks again, Nhat. Have a great day!
Joshua