Re: [PATCH v2] mm: cma: indefinitely retry allocations in cma_alloc

From: Chris Goldsworthy
Date: Thu Sep 17 2020 - 14:31:24 EST


On 2020-09-15 00:53, David Hildenbrand wrote:
On 14.09.20 20:33, Chris Goldsworthy wrote:
On 2020-09-14 02:31, David Hildenbrand wrote:
On 11.09.20 21:17, Chris Goldsworthy wrote:

So, inside of cma_alloc(), instead of giving up when
alloc_contig_range()
returns -EBUSY after having scanned a whole CMA-region bitmap, perform
retries indefinitely, with sleeps, to give the system an opportunity
to
unpin any pinned pages.

Signed-off-by: Chris Goldsworthy <cgoldswo@xxxxxxxxxxxxxx>
Co-developed-by: Vinayak Menon <vinmenon@xxxxxxxxxxxxxx>
Signed-off-by: Vinayak Menon <vinmenon@xxxxxxxxxxxxxx>
---
mm/cma.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/mm/cma.c b/mm/cma.c
index 7f415d7..90bb505 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -442,8 +443,28 @@ struct page *cma_alloc(struct cma *cma, size_t
count, unsigned int align,
bitmap_maxno, start, bitmap_count, mask,
offset);
if (bitmap_no >= bitmap_maxno) {
- mutex_unlock(&cma->lock);
- break;
+ if (ret == -EBUSY) {
+ mutex_unlock(&cma->lock);
+
+ /*
+ * Page may be momentarily pinned by some other
+ * process which has been scheduled out, e.g.
+ * in exit path, during unmap call, or process
+ * fork and so cannot be freed there. Sleep
+ * for 100ms and retry the allocation.
+ */
+ start = 0;
+ ret = -ENOMEM;
+ msleep(100);
+ continue;
+ } else {
+ /*
+ * ret == -ENOMEM - all bits in cma->bitmap are
+ * set, so we break accordingly.
+ */
+ mutex_unlock(&cma->lock);
+ break;
+ }
}
bitmap_set(cma->bitmap, bitmap_no, bitmap_count);
/*


What about long-term pinnings? IIRC, that can happen easily e.g., with
vfio (and I remember there is a way via vmsplice).

Not convinced trying forever is a sane approach in the general case ...

V1:
[1] https://lkml.org/lkml/2020/8/5/1097
[2] https://lkml.org/lkml/2020/8/6/1040
[3] https://lkml.org/lkml/2020/8/11/893
[4] https://lkml.org/lkml/2020/8/21/1490
[5] https://lkml.org/lkml/2020/9/11/1072

We're fine with doing indefinite retries, on the grounds that if there
is some long-term pinning that occurs when alloc_contig_range returns
-EBUSY, that it should be debugged and fixed. Would it be possible to
make this infinite-retrying something that could be enabled or disabled
by a defconfig option?

Two thoughts:

This means I strongly prefer something like [3] if feasible.

I can give [3] some further thought then. Also, I realized [3] will not completely solve the problem, it just reduces the window in which _refcount > _mapcount (as mentioned in earlier threads, we encountered the pinning when a task in copy_one_pte() or in the exit_mmap() path gets context switched out). If we were to try a sleeping-lock based solution, do you think it would be permissible to add another lock to struct page?

2. The issue that I am having is that long-term pinnings are
(unfortunately) a real thing. It's not something to debug and fix as you
suggest. Like, run a VM with VFIO (e.g., PCI passthrough). While that VM
is running, all VM memory will be pinned. If memory falls onto a CMA
region your cma_alloc() will be stuck in an (endless, meaning until the
VM ended) loop. I am not sure if all cma users are fine with that -
especially, think about CMA being used for gigantic pages now.

Assume you want to start a new VM while the other one is running and use
some (new) gigantic pages for it. Suddenly you're trapped in an endless
loop in the kernel. That's nasty.


Thanks for providing this example.


If we want to stick to retrying forever, can't we use flags like
__GFP_NOFAIL to explicitly enable this new behavior for selected
cma_alloc() users that really can't fail/retry manually again?

This would work, we would just have to undo the work done by this patch / re-introduce the GFP parameter for cma_alloc(): http://lkml.kernel.org/r/20180709122019eucas1p2340da484acfcc932537e6014f4fd2c29~-sqTPJKij2939229392eucas1p2j@xxxxxxxxxxxxxxxxxxxx , and add the support __GFP_NOFAIL (and ignore any flag that is not one of __GFP_NOFAIL or __GFP_NOWARN).

--
The Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project