Re: [RFC-PATCH 1/2] mm: Add __GFP_NO_LOCKS flag

From: Uladzislau Rezki
Date: Mon Aug 17 2020 - 06:36:14 EST


On Mon, Aug 17, 2020 at 10:28:49AM +0200, Michal Hocko wrote:
> On Mon 17-08-20 00:56:55, Uladzislau Rezki wrote:
> [...]
> > Michal asked to provide some data regarding how many pages we need and how
> > "lockless allocation" behaves when it comes to success vs failed scenarios.
> >
> > Please see below some results. The test case is a tight loop of 1 000 000 allocations
> > doing kmalloc() and kfree_rcu():
>
> It would be nice to cover some more realistic workloads as well.
>
Hmm.. I tried to show syntactic worst case when a "flood" occurs.
In such conditions we can get fails what is expectable and we have
fallback mechanism for it.

> > sudo ./test_vmalloc.sh run_test_mask=2048 single_cpu_test=1
> >
> > <snip>
> > for (i = 0; i < 1 000 000; i++) {
> > p = kmalloc(sizeof(*p), GFP_KERNEL);
> > if (!p)
> > return -1;
> >
> > p->array[0] = 'a';
> > kvfree_rcu(p, rcu);
> > }
> > <snip>
> >
> > wget ftp://vps418301.ovh.net/incoming/1000000_kmalloc_kfree_rcu_proc_percpu_pagelist_fractio_is_0.png
>
> If I understand this correctly then this means that failures happen very
> often because pcp pages are not recycled quicklly enough.
>
Yep, it happens and that is kind of worst scenario(flood one). Therefore we
have a fallback and is expectable. Also, i did not provide the number of pages
in a loop. On my test machine we need approximately ~300/400 pages to cover
that flood case until we recycles or return back the pages to the pcp.

Please note, as i mentioned before. Our drain part is not optimal for sure,
it means that we can rework it a bit making it more efficient. For example,
when a flood occurs, instead of delaying "reclaimer logic" thread, it can be
placed to a run-queue right away. We can use separate "flush workqueue"
that is tagged with WQ_MEM_RECLAIM raising a priority of drain context.

i.e. there is a room for reducing such page footprint.

> > wget ftp://vps418301.ovh.net/incoming/1000000_kmalloc_kfree_rcu_proc_percpu_pagelist_fractio_is_8.png
>
> 1/8 of the memory in pcp lists is quite large and likely not something
> used very often.
>
Just for illustration. When percpu_pagelist_fractio is set to 8, i do
not see any page fail on a single CPU flood case. If i run simultaneously
such flood on all available CPUs there will be fails for sure.

> Both these numbers just make me think that a dedicated pool of page
> pre-allocated for RCU specifically might be a better solution. I still
> haven't read through that branch of the email thread though so there
> might be some pretty convincing argments to not do that.
>
> > Also i would like to underline, that kfree_rcu() reclaim logic can be improved further,
> > making the drain logic more efficient when it comes to time, thus to reduce a footprint
> > as a result number of required pages.
> >
> > --
> > Vlad Rezki
>
> --
> Michal Hocko
> SUSE Labs