Re: [PATCH v2 0/6] Introduce __mt_dup() to improve the performance of fork()

From: Liam R. Howlett
Date: Thu Sep 07 2023 - 16:20:37 EST


* Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx> [230830 08:57]:
> In the process of duplicating mmap in fork(), VMAs will be inserted into the new
> maple tree one by one. When inserting into the maple tree, the maple tree will
> be rebalanced multiple times. The rebalancing of maple tree is not as fast as
> the rebalancing of red-black tree and will be slower. Therefore, __mt_dup() is
> introduced to directly duplicate the structure of the old maple tree, and then
> modify each element of the new maple tree. This avoids rebalancing and some extra
> copying, so is faster than the original method.
> More information can refer to [1].

Thanks for this patch set, it's really coming along nicely.

>
> There is a "spawn" in byte-unixbench[2], which can be used to test the performance
> of fork(). I modified it slightly to make it work with different number of VMAs.
>
> Below are the test numbers. There are 21 VMAs by default. The first row indicates
> the number of added VMAs. The following two lines are the number of fork() calls
> every 10 seconds. These numbers are different from the test results in v1 because
> this time the benchmark is bound to a CPU. This way the numbers are more stable.
>
> Increment of VMAs: 0 100 200 400 800 1600 3200 6400
> 6.5.0-next-20230829: 111878 75531 53683 35282 20741 11317 6110 3158
> Apply this patchset: 114531 85420 64541 44592 28660 16371 9038 4831
> +2.37% +13.09% +20.23% +26.39% +38.18% +44.66% +47.92% +52.98%

Can you run this with the default 21 as well?

>
> Todo:
> - Update the documentation.
>
> Changes since v1:
> - Reimplement __mt_dup() and mtree_dup(). Loops are implemented without using
> goto instructions.
> - The new tree also needs to be locked to avoid some lockdep warnings.
> - Drop and add some helpers.

I guess this also includes the changes to remove the new ways of finding
a node end and using that extra bit in the address? Those were
significant and welcome changes. Thanks.

> - Add test for duplicating full tree.
> - Drop mas_replace_entry(), it doesn't seem to have a big impact on the
> performance of fork().
>
> [1] https://lore.kernel.org/lkml/463899aa-6cbd-f08e-0aca-077b0e4e4475@xxxxxxxxxxxxx/
> [2] https://github.com/kdlucas/byte-unixbench/tree/master
>
> v1: https://lore.kernel.org/lkml/20230726080916.17454-1-zhangpeng.00@xxxxxxxxxxxxx/
>
> Peng Zhang (6):
> maple_tree: Add two helpers
> maple_tree: Introduce interfaces __mt_dup() and mtree_dup()
> maple_tree: Add test for mtree_dup()
> maple_tree: Skip other tests when BENCH is enabled
> maple_tree: Update check_forking() and bench_forking()
> fork: Use __mt_dup() to duplicate maple tree in dup_mmap()
>
> include/linux/maple_tree.h | 3 +
> kernel/fork.c | 34 ++-
> lib/maple_tree.c | 277 ++++++++++++++++++++++++-
> lib/test_maple_tree.c | 69 +++---
> mm/mmap.c | 14 +-
> tools/testing/radix-tree/maple.c | 346 +++++++++++++++++++++++++++++++
> 6 files changed, 697 insertions(+), 46 deletions(-)
>
> --
> 2.20.1
>