Re: [PATCH] mm: vmalloc: Use the vmap_area_lock to protect ne_fit_preload_node

From: Uladzislau Rezki
Date: Mon Oct 07 2019 - 17:44:33 EST


On Mon, Oct 07, 2019 at 07:36:44PM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-10-07 18:56:11 [+0200], Uladzislau Rezki wrote:
> > Actually there is a high lock contention on vmap_area_lock, because it
> > is still global. You can have a look at last slide:
> >
> > https://linuxplumbersconf.org/event/4/contributions/547/attachments/287/479/Reworking_of_KVA_allocator_in_Linux_kernel.pdf
> >
> > so this change will make it a bit higher. From the other hand i agree
> > that for rt it should be fixed, probably it could be done like:
> >
> > ifdef PREEMPT_RT
> > migrate_disable()
> > #else
> > preempt_disable()
> > ...
> >
> > but i am not sure it is good either.
>
> What is to be expected on average? Is the lock acquired and then
> released again because the slot is empty and memory needs to be
> allocated or can it be assumed that this hardly happens?
>
The lock is not released(we are not allowed), instead we just try
to allocate with GFP_NOWAIT flag. It can happen if preallocation
has been failed with GFP_KERNEL flag earlier:

<snip>
...
} else if (type == NE_FIT_TYPE) {
/*
* Split no edge of fit VA.
*
* | |
* L V NVA V R
* |---|-------|---|
*/
lva = __this_cpu_xchg(ne_fit_preload_node, NULL);
if (unlikely(!lva)) {
...
lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT);
...
}
...
<snip>

How often we need an extra object for split purpose, the answer
is it depends on. For example fork() path falls to that pattern.

I think we can assume that migration can hardly ever happen and
that should be considered as rare case. Thus we can do a prealoading
without worrying much if a it occurs:

<snip>
urezki@pc636:~/data/ssd/coding/linux-stable$ git diff
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e92ff5f7dd8b..bc782edcd1fd 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1089,20 +1089,16 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
* Even if it fails we do not really care about that. Just proceed
* as it is. "overflow" path will refill the cache we allocate from.
*/
- preempt_disable();
- if (!__this_cpu_read(ne_fit_preload_node)) {
- preempt_enable();
+ if (!this_cpu_read(ne_fit_preload_node)) {
pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
- preempt_disable();

- if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
+ if (this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
if (pva)
kmem_cache_free(vmap_area_cachep, pva);
}
}

spin_lock(&vmap_area_lock);
- preempt_enable();

/*
* If an allocation fails, the "vend" address is
urezki@pc636:~/data/ssd/coding/linux-stable$
<snip>

so, we do not guarantee, instead we minimize number of allocations
with GFP_NOWAIT flag. For example on my 4xCPUs i am not able to
even trigger the case when CPU is not preloaded.

I can test it tomorrow on my 12xCPUs to see its behavior there.

--
Vlad Rezki