Re: [PATCH] zswap: use B-tree for search

From: Vitaly Wool
Date: Mon Nov 18 2019 - 16:59:23 EST


On Mon, Nov 18, 2019 at 8:04 PM JÃrn Engel <joern@xxxxxxxxxxxxxxx> wrote:
>
> On Sun, Nov 17, 2019 at 08:53:32PM +0200, vitaly.wool@xxxxxxxxxxxx wrote:
> > From: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>
> >
> > The current zswap implementation uses red-black trees to store
> > entries and to perform lookups. Although this algorithm obviously
> > has complexity of O(log N) it still takes a while to complete
> > lookup (or, even more for replacement) of an entry, when the amount
> > of entries is huge (100K+).
> >
> > B-trees are known to handle such cases more efficiently (i. e. also
> > with O(log N) complexity but with way lower coefficient) so trying
> > zswap with B-trees was worth a shot.
> >
> > The implementation of B-trees that is currently present in Linux
> > kernel isn't really doing things in the best possible way (i. e. it
> > has recursion) but the testing I've run still shows a very
> > significant performance increase.
> >
> > The usage pattern of B-tree here is not exactly following the
> > guidelines but it is due to the fact that pgoff_t may be both 32
> > and 64 bits long.
> >
> > Tested on qemu-kvm (-smp 2 -m 1024) with zswap in the following
> > configuration:
> > * zpool: z3fold
> > * max_pool_percent: 100
> > and the swap size of 1G.
> >
> > Test command:
> > $ stress-ng --io 4 --vm 4 --vm-bytes 1000M --timeout 300s --metrics
> >
> > This, averaged over 20 runs on qemu-kvm (-smp 2 -m 1024) gives the
> > following io bogo ops:
> > * original: 73778.8
> > * btree: 393999
>
> Impressive results. Was your test done with a 32bit guest? If yes, I
> would assume results for a 64bit guess to drop to about 330k.

No, it's on a 64 bit virtual machine. I take this improvement is partially
due to zswap_insert_or_replace function which requires less lookups
than the initial implementation, but it's the btree API that made it possible.

> > + if (sizeof(pgoff_t) == 8)
> > + btree_pgofft_geo = &btree_geo64;
> > + else
> > + btree_pgofft_geo = &btree_geo32;
> > +
>
> You could abuse the fact that pgoff_t is the same size as unsigned long
> and use the "l" suffix variant. But apart from the obvious abuse, the
> "l" variant hasn't been used before and the implementation appears to be
> buggy.
>
> So no complaints about your use of the interface.

Thanks! I would then keep it as is and have a task for myself to try out and
possibly debug the "l" suffix variant later on.

~Vitaly