Re: [PATCH v2 14/16] maple_tree: Refine mas_preallocate() node calculations

From: Danilo Krummrich
Date: Mon Jun 26 2023 - 14:24:31 EST


On 6/26/23 16:49, Peng Zhang wrote:


在 2023/6/26 22:27, Danilo Krummrich 写道:
On 6/26/23 15:19, Matthew Wilcox wrote:
On Mon, Jun 26, 2023 at 02:38:06AM +0200, Danilo Krummrich wrote:
On the other hand, unless I miss something (and if so, please let me know),
something is bogus with the API then.

While the documentation of the Advanced API of the maple tree explicitly
claims that the user of the API is responsible for locking, this should be
limited to the bounds set by the maple tree implementation. Which means, the
user must decide for either the internal (spin-) lock or an external lock
(which possibly goes away in the future) and acquire and release it
according to the rules maple tree enforces through lockdep checks.

Let's say one picks the internal lock. How is one supposed to ensure the
tree isn't modified using the internal lock with mas_preallocate()?

Besides that, I think the documentation should definitely mention this
limitation and give some guidance for the locking.

Currently, from an API perspective, I can't see how anyone not familiar with
the implementation details would be able to recognize this limitation.

In terms of the GPUVA manager, unfortunately, it seems like I need to drop
the maple tree and go back to using a rb-tree, since it seems there is no
sane way doing a worst-case pre-allocation that does not suffer from this
limitation.

I haven't been paying much attention here (too many other things going
on), but something's wrong.

First, you shouldn't need to preallocate.  Preallocation is only there

Unfortunately, I think we really have a case where we have to. Typically GPU mappings are created in a dma-fence signalling critical path and that is where such mappings need to be added to the maple tree. Hence, we can't do any sleeping allocations there.

for really gnarly cases.  The way this is *supposed* to work is that
the store walks down to the leaf, attempts to insert into that leaf
and tries to allocate new nodes with __GFP_NOWAIT.  If that fails,
it drops the spinlock, allocates with the gfp flags you've specified,
then rewalks the tree to retry the store, this time with allocated
nodes in its back pocket so that the store will succeed.

You are talking about mas_store_gfp() here, right? And I guess, if the tree has changed while the spinlock was dropped and even more nodes are needed it just retries until it succeeds?

But what about mas_preallocate()? What happens if the tree changed in between mas_preallocate() and mas_store_prealloc()? Does the latter one fall back to __GFP_NOWAIT in such a case? I guess not, since mas_store_prealloc() has a void return type, and __GFP_NOWAIT could fail as well.
mas_store_prealloc() will fallback to __GFP_NOWAIT and issue a warning.
If __GFP_NOWAIT allocation fails, BUG_ON() in mas_store_prealloc() will
be triggered.

Ok, so this is an absolute last resort and surely should not be relied on.

I think the maple tree should either strictly enforce (through locking policy) that this can never happen or if API wise it is OK not to lock these two is legit, return an error code rather then issue a warning and even worse call BUG_ON() in case it can't fix things up.

- Danilo



So, how to use the internal spinlock for mas_preallocate() and mas_store_prealloc() to ensure the tree can't change?