Re: [PATCH rfc 0/2] mm: cma: make cma_release() non-blocking
From: Roman Gushchin
Date: Wed Oct 21 2020 - 22:46:11 EST
On Thu, Oct 22, 2020 at 09:54:53AM +0800, Xiaqing (A) wrote:
>
>
> On 2020/10/17 6:52, Roman Gushchin wrote:
>
> > This small patchset makes cma_release() non-blocking and simplifies
> > the code in hugetlbfs, where previously we had to temporarily drop
> > hugetlb_lock around the cma_release() call.
> >
> > It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic
> > THP under a memory pressure requires a cma_release() call. If it's
> > a blocking function, it complicates the already complicated code.
> > Because there are at least two use cases like this (hugetlbfs is
> > another example), I believe it's just better to make cma_release()
> > non-blocking.
> >
> > It also makes it more consistent with other memory releasing functions
> > in the kernel: most of them are non-blocking.
> >
> >
> > Roman Gushchin (2):
> > mm: cma: make cma_release() non-blocking
> > mm: hugetlb: don't drop hugetlb_lock around cma_release() call
> >
> > mm/cma.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++--
> > mm/hugetlb.c | 6 ------
> > 2 files changed, 49 insertions(+), 8 deletions(-)
> >
> I don't think this patch is a good idea.It transfers part or even all of the time of
> cma_release to cma_alloc, which is more concerned by performance indicators.
I'm not quite sure: if cma_alloc() is racing with cma_release(), cma_alloc() will
wait for the cma_lock mutex anyway. So we don't really transfer anything to cma_alloc().
> On Android phones, CPU resource competition is intense in many scenarios,
> As a result, kernel threads and workers can be scheduled only after some ticks or more.
> In this case, the performance of cma_alloc will deteriorate significantly,
> which is not good news for many services on Android.
Ok, I agree, if the cpu is heavily loaded, it might affect the total execution time.
If we aren't going into the mutex->spinlock conversion direction (as Mike suggested),
we can address the performance concerns by introducing a cma_release_nowait() function,
so that the default cma_release() would work in the old way.
cma_release_nowait() can set an atomic flag on a cma area, which will cause following
cma_alloc()'s to flush the release queue. In this case there will be no performance
penalty unless somebody is using cma_release_nowait().
Will it work for you?
Thank you!