Re: [PATCH drm-next v2 05/16] drm: manager to keep track of GPUs VA mappings

From: Liam R. Howlett
Date: Wed Mar 01 2023 - 21:39:15 EST


* Danilo Krummrich <dakr@xxxxxxxxxx> [230227 08:17]:

...
> > > Would this variant be significantly more efficient?
> >
> > Well, what you are doing is walking the tree to see if there's anything
> > there... then re-walking the tree to store it. So, yes, it's much more
> > efficient.. However, writing is heavier. How much of the time is spent
> > walking vs writing depends on the size of the tree, but it's rather easy
> > to do this in a single walk of the tree so why wouldn't you?
>
> I will, I was just curious about how much of an impact it has.
>
> >
> > >
> > > Also, would this also work while already walking the tree?
> >
> > Yes, to an extent. If you are at the correct location in the tree, you
> > can write to that location. If you are not in the correct location and
> > try to write to the tree then things will go poorly.. In this scenario,
> > we are very much walking the tree and writing to it in two steps.
> >
> > >
> > > To remove an entry while walking the tree I have a separate function
> > > drm_gpuva_iter_remove(). Would I need something similar for inserting
> > > entries?
> >
> > I saw that. Your remove function uses the erase operation which is
> > implemented as a walk to that location and a store of a null over the
> > range that is returned. You do not need a function to insert an entry
> > if the maple state is at the correct location, and that doesn't just
> > mean setting mas.index/mas.last to the correct value. There is a node &
> > offset saved in the maple state that needs to be in the correct
> > location. If you store to that node then the node may be replaced, so
> > other iterators that you have may become stale, but the one you used
> > execute the store operation will now point to the new node with the new
> > entry.
> >
> > >
> > > I already provided this example in a separate mail thread, but it may makes
> > > sense to move this to the mailing list:
> > >
> > > In __drm_gpuva_sm_map() we're iterating a given range of the tree, where the
> > > given range is the size of the newly requested mapping. __drm_gpuva_sm_map()
> > > invokes a callback for each sub-operation that needs to be taken in order to
> > > fulfill this mapping request. In most cases such a callback just creates a
> > > drm_gpuva_op object and stores it in a list.
> > >
> > > However, drivers can also implement the callback, such that they directly
> > > execute this operation within the callback.
> > >
> > > Let's have a look at the following example:
> > >
> > > 0 a 2
> > > old: |-----------| (bo_offset=n)
> > >
> > > 1 b 3
> > > req: |-----------| (bo_offset=m)
> > >
> > > 0 a' 1 b 3
> > > new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
> > >
> > > This would result in the following operations.
> > >
> > > __drm_gpuva_sm_map() finds entry "a" and calls back into the driver
> > > suggesting to re-map "a" with the new size. The driver removes entry "a"
> > > from the tree and adds "a'"
> >
> > What you have here won't work. The driver will cause your iterators
> > maple state to point to memory that is freed. You will either need to
> > pass through your iterator so that the modifications can occur with that
> > maple state so it remains valid, or you will need to invalidate the
> > iterator on every modification by the driver.
> >
> > I'm sure the first idea you have will be to invalidate the iterator, but
> > that is probably not the way to proceed. Even ignoring the unclear
> > locking of two maple states trying to modify the tree, this is rather
> > inefficient - each invalidation means a re-walk of the tree. You may as
> > well not use an iterator in this case.
> >
> > Depending on how/when the lookups occur, you could still iterate over
> > the tree and let the driver modify the ending of "a", but leave the tree
> > alone and just store b over whatever - but the failure scenarios may
> > cause you grief.
> >
> > If you pass the iterator through, then you can just use it to do your
> > writes and keep iterating as if nothing changed.
>
> Passing through the iterater clearly seems to be the way to go.
>
> I assume that if the entry to insert isn't at the location of the iterator
> (as in the following example) we can just keep walking to this location my
> changing the index of the mas and calling mas_walk()?

no. You have to mas_set() to the value and walk from the top of the
tree. mas_walk() walks down, not from side to side - well, it does go
forward within a node (increasing offset), but if you hit the node limit
then you have gotten yourself in trouble.

> This would also imply
> that the "outer" tree walk continues after the entry we just inserted,
> right?

I don't understand the "outer" tree walk statement.

>
> 1 a 3
> old: |-----------| (bo_offset=n)
>
> 0 b 2
> req: |-----------| (bo_offset=m)
>
> 0 b 2 a' 3
> new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
>
> Again, after finding "a", we want to remove it and insert "a'" instead.

Ah, so you could walk to 0, see that it's NULL from 0 - 1, call
mas_next() and get "a" from 1 - 3, write "a'" from 2 - 3:

0 1 a 2 a' 3
broken: |-----|------|-----| (a is broken in this 1/2 step)

mas_set_range(&mas, 0, 2); /* Resets the tree location to MAS_START */
mas_store(&mas, b);
0 b 2 a' 3
new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)


You can *probably* also get away with this:

walk to 0, see that it's NULL from 0 - 1, call mas_next() and get "a"
from 1 - 3, write "a'" from 2 - 3:

0 1 a 2 a' 3
broken: |-----|------|-----| (a is broken in this 1/2 step)

mas_prev(&mas, 0); /* Looking at broken a from 1-2.
mas_store(&mas, NULL); /* NULL is expanded on write to 0-2.
0 NULL 2 a' 3
broken': |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)

mas_store(&mas, b);
0 b 2 a' 3
new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)

You may want to iterate backwards and do the writes as you go until you
have enough room.. it really depends how you want to go about doing
things.

>
> >
> > >
> > > __drm_gpuva_sm_map(), ideally, continues the loop searching for nodes
> > > starting from the end of "a" (which is 2) till the end of the requested
> > > mapping "b" (which is 3). Since it doesn't find any other mapping within
> > > this range it calls back into the driver suggesting to finally map "b".
> > >
> > > If there would have been another mapping between 2 and 3 it would have
> > > called back into the driver asking to unmap this mapping beforehand.
> > >
> > > So, it boils down to re-mapping as described at the beginning (and
> > > analogously at the end) of a new mapping range and removing of entries that
> > > are enclosed by the new mapping range.
> >
> > I assume the unmapped area is no longer needed, and the 're-map' is
> > really a removal of information? Otherwise I'd suggest searching for a
> > gap which fits your request. What you have here is a lot like
> > "MAP_FIXED" vs top-down/bottom-up search in the VMA code, this seems to
> > be like your __drm_gpuva_sm_map() and the drm mm range allocator with
> > DRM_MM_INSERT_LOW, and DRM_MM_INSERT_HIGH.
> >
> > Why can these split/unmappings fail? Is it because they are still
> > needed?
> >
>
> You mean the check before the mas_*() operations in drm_gpuva_insert()?

Yes, the callbacks.

>
> Removing entries should never fail, inserting entries should fail when the
> caller tries to store to an area outside of the VA space (it doesn't
> necessarily span the whole 64-bit space), a kernel reserved area of the VA
> space, is not in any pre-allocated range of the VA space (if regions are
> enabled) or an entry already exists at that location.

In the mmap code, I have to deal with splitting the start/end VMA and
removing any VMAs in the way. I do this by making a 'detached' tree
that is dealt with later, then just overwriting the area with one
mas_store() operation. Would something like that work for you?

>
> > >
> > > > > + if (unlikely(ret))
> > > > > + return ret;
> > > > > +
> > > > > + va->mgr = mgr;
> > > > > + va->region = reg;
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gpuva_insert);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuva_remove - remove a &drm_gpuva
> > > > > + * @va: the &drm_gpuva to remove
> > > > > + *
> > > > > + * This removes the given &va from the underlaying tree.
> > > > > + */
> > > > > +void
> > > > > +drm_gpuva_remove(struct drm_gpuva *va)
> > > > > +{
> > > > > + MA_STATE(mas, &va->mgr->va_mt, va->va.addr, 0);
> > > > > +
> > > > > + mas_erase(&mas);
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gpuva_remove);
> > > > > +
> > > > ...
> > > >
> > > > > +/**
> > > > > + * drm_gpuva_find_first - find the first &drm_gpuva in the given range
> > > > > + * @mgr: the &drm_gpuva_manager to search in
> > > > > + * @addr: the &drm_gpuvas address
> > > > > + * @range: the &drm_gpuvas range
> > > > > + *
> > > > > + * Returns: the first &drm_gpuva within the given range
> > > > > + */
> > > > > +struct drm_gpuva *
> > > > > +drm_gpuva_find_first(struct drm_gpuva_manager *mgr,
> > > > > + u64 addr, u64 range)
> > > > > +{
> > > > > + MA_STATE(mas, &mgr->va_mt, addr, 0);
> > > > > +
> > > > > + return mas_find(&mas, addr + range - 1);
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gpuva_find_first);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuva_find - find a &drm_gpuva
> > > > > + * @mgr: the &drm_gpuva_manager to search in
> > > > > + * @addr: the &drm_gpuvas address
> > > > > + * @range: the &drm_gpuvas range
> > > > > + *
> > > > > + * Returns: the &drm_gpuva at a given &addr and with a given &range
> > > >
> > > > Note that mas_find() will continue upwards in the address space if there
> > > > isn't anything at @addr. This means that &drm_gpuva may not be at
> > > > &addr. If you want to check just at &addr, use mas_walk().
> > >
> > > Good catch. drm_gpuva_find() should then either also check for 'va->va.addr
> > > == addr' as well or, alternatively, use mas_walk(). As above, any reason to
> > > prefer mas_walk()?

I think I missed this question last time..

Internally, mas_find() is just a mas_walk() on the first call, then
mas_next() for each call after that. If, during the mas_walk(), there
is no value at addr, it immediately calls mas_next() to get a value to
return. It will continue upwards until the limit is reached (addr +
range - 1 in your case).

So if you only want to know if there is something at addr, then it's
best to use mas_walk() and keep things a bit more efficient. Then you
can check mas.last for your end value.

If you do want the first VMA within the range passed in, then mas_find()
is the function you want.

> > >
> > > >
> > > > > + */
> > > > > +struct drm_gpuva *
> > > > > +drm_gpuva_find(struct drm_gpuva_manager *mgr,
> > > > > + u64 addr, u64 range)
> > > > > +{
> > > > > + struct drm_gpuva *va;
> > > > > +
> > > > > + va = drm_gpuva_find_first(mgr, addr, range);
> > > > > + if (!va)
> > > > > + goto out;
> > > > > +
> > > > > + if (va->va.range != range)
> > > > > + goto out;
> > > > > +
> > > > > + return va;
> > > > > +
> > > > > +out:
> > > > > + return NULL;
> > > > > +}
> > > > > +EXPORT_SYMBOL(drm_gpuva_find);
> > > > > +
> > > > > +/**
> > > > > + * drm_gpuva_find_prev - find the &drm_gpuva before the given address
> > > > > + * @mgr: the &drm_gpuva_manager to search in
> > > > > + * @start: the given GPU VA's start address
> > > > > + *
> > > > > + * Find the adjacent &drm_gpuva before the GPU VA with given &start address.
> > > > > + *
> > > > > + * Note that if there is any free space between the GPU VA mappings no mapping
> > > > > + * is returned.
> > > > > + *
> > > > > + * Returns: a pointer to the found &drm_gpuva or NULL if none was found
> > > > > + */
> > > > > +struct drm_gpuva *
> > > > > +drm_gpuva_find_prev(struct drm_gpuva_manager *mgr, u64 start)
> > > >
> > > > find_prev() usually continues beyond 1 less than the address. I found
> > > > this name confusing.
> > >
> > > Don't really get that, mind explaining?
> >
> > When I ask for the previous one in a list or tree, I think the one
> > before.. but since you are limiting your search from start to start - 1,
> > you may as well walk to start - 1 and see if one exists.
> >
> > Is that what you meant to do here?
>
> Yes, I want to know whether there is a previous entry which ends right
> before the current entry, without a gap between the two.
>
> >
> > >
> > > > You may as well use mas_walk(), it would be faster.
> > >
> > > How would I use mas_walk() for that? If I understand it correctly,
> > > mas_walk() requires me to know that start address, which I don't know for
> > > the previous entry.
> >
> > mas_walk() walks to the value you specify and returns the entry at that
> > address, not necessarily the start address, but any address in the
> > range.
> >
> > If you have a tree and store A = [0x1000 - 0x2000] and set your maple
> > state to walk to 0x1500, mas_walk() will return A, and the maple state
> > will have mas.index = 0x1000 and mas.last = 0x2000.
> >
> > You have set the maple state to start at "start" and called
> > mas_prev(&mas, start - 1). start - 1 is the lower limit, so the
> > internal implementation will walk to start then go to the previous entry
> > until start - 1.. it will stop at start - 1 and return NULL if there
> > isn't one there.
>
> Thanks for the clarification and all the other very helpful comments and
> explanations!
>

Always glad to help. The more users the tree has, the more I can see
where we may need to expand the interface to help others.

...