Re: [PATCH 0/5] Reduce a vmalloc internal lock contention preparation work

From: Uladzislau Rezki
Date: Wed Jun 08 2022 - 06:20:38 EST


>
> I can toss it in for some runtime testing, but...
>
> What lock are we talking about here, what is the magnitude of the
> performance issues it is causing and what is the status of the patch
> which uses all this preparation?
>
1.
The vmalloc still uses the global lock in order to access to the global
vmap space. As for magnitude it depends on number of CPUs, higher
number higher contention. Linear dependence.

2.
I am not aware about performance issues which i run into on my setup,
from the other hand there is a "Per cpu kva allocator" built on top of
vmalloc. See vm_map_ram() vm_unmap_ram(). Having vmalloc-per
CPU we can get rid of it.

It is used by the XFS, f2fs and some drivers. The reason is that a
vmalloc is costly due to internal global lock. That is why those users
go with "Per cpu kva allocator" to accelerate their workloads.

3.
My synthetic test shows a big difference between per-CPU vmalloc
patches and default variant. I have different prototypes based on
various ways how to make it per-CPU. I still do not have a fully solution
that satisfies all the needs. But i do not think it is possible due to many
constraints.

4.
This series is not tighten to future per-cpu-vmalloc patches, it is rather
makes the vmalloc code to be more generic as a result of such common
code it would be easier to extend it to per-cpu variant.

It means if per-cpu is not in place it is not needed to be reverted back.

That is the status.

--
Uladzislau Rezki