Re: [PATCH v2] mm: add zblock allocator

From: Igor Belousov
Date: Thu Apr 10 2025 - 03:02:49 EST


Hi Johannes,

Sure. zstd/8 cores/make -j32:

zsmalloc:
real 7m36.413s
user 38m0.481s
sys 7m19.108s
Zswap: 211028 kB
Zswapped: 925904 kB
zswpin 397851
zswpout 1625707
zswpwb 5126

zblock:
real 7m55.009s
user 39m23.147s
sys 7m44.004s
Zswap: 253068 kB
Zswapped: 919956 kB
zswpin 456843
zswpout 2058963
zswpwb 3921

So zstd results in nearly double the compression ratio, which in turn
cuts total execution time *almost in half*.

The numbers speak for themselves. Compression efficiency >>> allocator
speed, because compression efficiency ultimately drives the continuous
*rate* at which allocations need to occur. You're trying to optimize a
constant coefficient at the expense of a higher-order one, which is a
losing proposition.

Actually there's a slight bug in zblock code for 4K page case which caused storage inefficiency for small (== well compressed) memory blocks. With that one fixed, the results look a lot brighter for zblock:

1. zblock/zstd/8 cores/make -j32 bzImage
real 7m28.290s
user 37m27.055s
sys 7m18.629s
Zswap: 221516 kB
Zswapped: 904104 kB
zswpin 425424
zswpout 2011503
zswpwb 4111

For the sake of completeness I re-ran that test with the bugfix and LZ4 (so, zblock/lz4/8 cores/make -j32 bzImage) and I got:
real 7m44.154s
user 38m26.645s
sys 7m38.302s
zswpin 648108
zswpout 2490449
zswpwb 9499

So there's *no* significant cut with zstd in execution time, even on a Ryzen 9 and that invalidates your point. Sorry for the past confusion, it was an honest mistake from our side. If zsmalloc didn't OOM with lz4 we probably would have seen the discrepancy and found the bug earlier.

And on ARM64 and RISC-V targets we have run the tests on, zstd is slower than lz4.

/Igor