Re: [PATCH v2] mm: add zblock allocator

From: Vitaly Wool
Date: Tue Apr 08 2025 - 17:40:34 EST




On 4/8/25 21:55, Johannes Weiner wrote:
On Tue, Apr 08, 2025 at 01:20:11PM +0400, Igor Belousov wrote:
Now what's funny is that when I tried to compare how 32 threaded build
would behave on a 8-core VM I couldn't do it because it OOMs with
zsmalloc as zswap backend. With zblock it doesn't, though, and the
results are:
real 12m14.012s
user 39m37.777s
sys 14m6.923s
Zswap: 440148 kB
Zswapped: 924452 kB
zswpin 594812
zswpout 2802454
zswpwb 10878

It's LZ4 for all the test runs.

Can you try zstd and let me know how it goes :)

Sure. zstd/8 cores/make -j32:

zsmalloc:
real 7m36.413s
user 38m0.481s
sys 7m19.108s
Zswap: 211028 kB
Zswapped: 925904 kB
zswpin 397851
zswpout 1625707
zswpwb 5126

zblock:
real 7m55.009s
user 39m23.147s
sys 7m44.004s
Zswap: 253068 kB
Zswapped: 919956 kB
zswpin 456843
zswpout 2058963
zswpwb 3921

So zstd results in nearly double the compression ratio, which in turn
cuts total execution time *almost in half*.

The numbers speak for themselves. Compression efficiency >>> allocator
speed, because compression efficiency ultimately drives the continuous
*rate* at which allocations need to occur. You're trying to optimize a
constant coefficient at the expense of a higher-order one, which is a
losing proposition.

Well, not really. This is an isolated use case with
a. significant computing power under the hood
b. relatively few cores
c. relatively short test
d. 4K pages

If any of these isn't true, zblock dominates.
!a => zstd is too slow
!b => parallelization gives more effect
!c => zsmalloc starts losing due to having to deal with internal fragmentation
!d => compression efficiency of zblock is better.

Even !d alone makes zblock a better choice for ARM64 based servers.

~Vitaly