On Tue, Apr 08, 2025 at 01:20:11PM +0400, Igor Belousov wrote:
Now what's funny is that when I tried to compare how 32 threaded build
would behave on a 8-core VM I couldn't do it because it OOMs with
zsmalloc as zswap backend. With zblock it doesn't, though, and the
results are:
real 12m14.012s
user 39m37.777s
sys 14m6.923s
Zswap: 440148 kB
Zswapped: 924452 kB
zswpin 594812
zswpout 2802454
zswpwb 10878
It's LZ4 for all the test runs.
Can you try zstd and let me know how it goes :)
Sure. zstd/8 cores/make -j32:
zsmalloc:
real 7m36.413s
user 38m0.481s
sys 7m19.108s
Zswap: 211028 kB
Zswapped: 925904 kB
zswpin 397851
zswpout 1625707
zswpwb 5126
zblock:
real 7m55.009s
user 39m23.147s
sys 7m44.004s
Zswap: 253068 kB
Zswapped: 919956 kB
zswpin 456843
zswpout 2058963
zswpwb 3921
So zstd results in nearly double the compression ratio, which in turn
cuts total execution time *almost in half*.
The numbers speak for themselves. Compression efficiency >>> allocator
speed, because compression efficiency ultimately drives the continuous
*rate* at which allocations need to occur. You're trying to optimize a
constant coefficient at the expense of a higher-order one, which is a
losing proposition.