On Wed, 2023-10-04 at 19:17 +0200, Danilo Krummrich wrote:
On 10/4/23 17:29, Thomas Hellström wrote:
On Wed, 2023-10-04 at 14:57 +0200, Danilo Krummrich wrote:
On 10/3/23 11:11, Thomas Hellström wrote:
<snip>
+
+/**
+ * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to
/
from the &drm_gpuvms
+ * evicted list
+ * @vm_bo: the &drm_gpuvm_bo to add or remove
+ * @evict: indicates whether the object is evicted
+ *
+ * Adds a &drm_gpuvm_bo to or removes it from the
&drm_gpuvms
evicted list.
+ */
+void
+drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
+{
+ struct drm_gem_object *obj = vm_bo->obj;
+
+ dma_resv_assert_held(obj->resv);
+
+ /* Always lock list transactions, even if
DRM_GPUVM_RESV_PROTECTED is
+ * set. This is required to protect multiple
concurrent
calls to
+ * drm_gpuvm_bo_evict() with BOs with different
dma_resv.
+ */
This doesn't work. The RESV_PROTECTED case requires the
evicted
flag we discussed before. The list is either protected by the
spinlock or the resv. Otherwise a list add could race with a
list
removal elsewhere.
I think it does unless I miss something, but it might be a bit
subtle
though.
Concurrent drm_gpuvm_bo_evict() are protected by the spinlock.
Additionally, when
drm_gpuvm_bo_evict() is called we hold the dma-resv of the
corresponding GEM object.
In drm_gpuvm_validate() I assert that we hold *all* dma-resv,
which
implies that no
one can call drm_gpuvm_bo_evict() on any of the VM's objects and
no
one can add a new
one and directly call drm_gpuvm_bo_evict() on it either.
But translated into how the data (the list in this case) is
protected
it becomes
"Either the spinlock and the bo resv of a single list item OR the
bo
resvs of all bos that can potentially be on the list",
while this is certainly possible to assert, any new / future code
that
manipulates the evict list will probably get this wrong and as a
result
the code becomes pretty fragile. I think drm_gpuvm_bo_destroy()
already
gets it wrong in that it, while holding a single resv, doesn't take
the
spinlock.
That's true and I don't like it either. Unfortunately, with the dma-
resv
locking scheme we can't really protect the evict list without the
drm_gpuvm_bo::evicted trick properly.
But as pointed out in my other reply, I'm a bit worried about the
drm_gpuvm_bo::evicted trick being too restrictive, but maybe it's
fine
doing it in the RESV_PROTECTED case.
Ah, indeed. I misread that as discussing the current code rather than
the drm_gpuvm_bo::evicted trick. If validating only a subset, or a
range, then with the drm_gpuvm_bo::evicted trick would be valid only
for that subset.
But the current code would break because the condition of locking "the
resvs of all bos that can potentially be on the list" doesn't hold
anymore, and you'd get list corruption.
What *would* work, though, is the solution currently in xe, The
original evict list, and a staging evict list whose items are copied
over on validation. The staging evict list being protected by the
spinlock, the original evict list by the resv, and they'd use separate
list heads in the drm_gpuvm_bo, but that is yet another complication.
But I think if this becomes an issue, those VMs (perhaps OpenGL UMD
VMs) only wanting to validate a subset, would simply initially rely on
the current non-RESV solution. It looks like it's only a matter of
flipping the flag on a per-vm basis.
/Thomas
So I think that needs fixing, and if keeping that protection I
think it
needs to be documented with the list member and ideally an assert.
But
also note that lockdep_assert_held will typically give false true
for
dma_resv locks; as long as the first dma_resv lock locked in a
drm_exec
sequence remains locked, lockdep thinks *all* dma_resv locks are
held.
(or something along those lines), so the resv lockdep asserts are
currently pretty useless.
/Thomas
Thanks,
Thomas