On Wed, Apr 23, 2025 at 09:53:48PM +0200, Vitaly Wool wrote:
On 4/22/25 12:46, Yosry Ahmed wrote:
I didn't look too closely but I generally agree that we should improve
zsmalloc where possible rather than add a new allocator. We are trying
not to repeat the zbud/z3fold or slub/slob stories here. Zsmalloc is
getting a lot of mileage from both zswap and zram, and is more-or-less
battle-tested. Let's work toward building upon that instead of starting
over.
The thing here is, zblock is using a very different approach to small object
allocation. The idea is: we have an array of descriptors which correspond to
multi-page blocks divided in chunks of equal size (block_size[i]). For each
object of size x we find the descriptor n such as:
block_size[n-1] < n < block_size[n]
and then we store that object in an empty slot in one of the blocks. Thus,
the density is high, the search is fast (rbtree based) and there are no
objects spanning over 2 pages, so no extra memcpy involved.
The block sizes seem to be similar in principle to class sizes in
zsmalloc. It seems to me that there are two apparent differentiating
properties to zblock:
- Block lookup uses an rbtree, so it's faster than zsmalloc's list
iteration. On the other hand, zsmalloc divides each class into
fullness groups and tries to pack almost full groups first. Not sure
if zblock's approach is strictly better.
- Zblock uses higher order allocations vs. zsmalloc always using order-0
allocations. I think this may be the main advantage and I remember
asking if zsmalloc can support this. Always using order-0 pages is
more reliable but may not always be the best choice.
On the other hand, zblock is lacking in other regards. For example:
- The lack of compaction means that certain workloads will see a lot of
fragmentation. It purely depends on the access patterns. We could end
up with a lot of blocks each containing a single object and there is
no way to recover AFAICT.
- Zblock will fail if a high order allocation cannot be satisfied, which
is more likely to happen under memory pressure, and it's usually when
zblock is needed in the first place.
- There's probably more, I didn't check too closely, and I am hoping
that Minchan and Sergey will chime in here.
And with the latest zblock, we see that it has a clear advantage in
performance over zsmalloc, retaining roughly the same allocation density for
4K pages and scoring better on 16K pages. E. g. on a kernel compilation:
* zsmalloc/zstd/make -j32 bzImage
real 8m0.594s
user 39m37.783s
sys 8m24.262s
Zswap: 200600 kB <-- after build completion
Zswapped: 854072 kB <-- after build completion
zswpin 309774
zswpout 1538332
* zblock/zstd/make -j32 bzImage
real 7m35.546s
user 38m03.475s
sys 7m47.407s
Zswap: 250940 kB <-- after build completion
Zswapped: 870660 kB <-- after build completion
zswpin 248606
zswpout 1277319
So what we see here is that zblock is definitely faster and at least not
worse with regard to allocation density under heavy load. It has slightly
worse _idle_ allocation density but since it will quickly catch up under
load it is not really important. What is important is that its
characteristics don't deteriorate over time. Overall, zblock is simple and
efficient and there is /raison d'etre/ for it.
Zblock is performing better for this specific workload, but as I
mentioned earlier there are other aspects that zblock is missing.
Zsmalloc has seen a very large range of workloads of different types,
and we cannot just dismiss this.
Now, it is indeed possible to partially rework zsmalloc using zblock's
algorithm but this will be a rather substantial change, equal or bigger in
effort to implementing the approach described above from scratch (and this
is what we did), and with such drastic changes most of the testing that has
been done with zsmalloc would be invalidated, and we'll be out in the wild
anyway. So even though I see your point, I don't think it applies in this
particular case.
Well, we should start by breaking down the differences and finding out
why zblock is performing better, as I mentioned above. If it's the
faster lookups or higher order allocations, we can work to support that
in zsmalloc. Similarly, if zsmalloc has unnecessary complexity it'd be
great to get rid of it rather than starting over.
Also, we don't have to do it all at once and invalidate the testing that
zsmalloc has seen. These can be incremental changes that get spread over
multiple releases, getting incremental exposure in the process.