Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister

From: Tvrtko Ursulin
Date: Wed Jul 24 2024 - 03:48:31 EST



On 23/07/2024 16:30, Lucas De Marchi wrote:
On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote:

On 22/07/2024 22:06, Lucas De Marchi wrote:
Instead of calling perf_pmu_unregister() when unbinding, defer that to
the destruction of i915 object. Since perf itself holds a reference in
the event, this only happens when all events are gone, which guarantees
i915 is not unregistering the pmu with live events.

Previously, running the following sequence would crash the system after
~2 tries:

    1) bind device to i915
    2) wait events to show up on sysfs
    3) start perf  stat -I 1000 -e i915/rcs0-busy/
    4) unbind driver
    5) kill perf

Most of the time this crashes in perf_pmu_disable() while accessing the
percpu pmu_disable_count. This happens because perf_pmu_unregister()
destroys it with free_percpu(pmu->pmu_disable_count).

With a lazy unbind, the pmu is only unregistered after (5) as opposed to
after (4). The downside is that if a new bind operation is attempted for
the same device/driver without killing the perf process, i915 will fail
to register the pmu (but still load successfully). This seems better
than completely crashing the system.

So effectively allows unbind to succeed without fully unbinding the driver from the device? That sounds like a significant drawback and if so, I wonder if a more complicated solution wouldn't be better after all. Or is there precedence for allowing userspace keeping their paws on unbound devices in this way?

keeping the resources alive but "unplunged" while the hardware
disappeared is a common thing to do... it's the whole point of the
drmm-managed resource for example. If you bind the driver and then
unbind it while userspace is holding a ref, next time you try to bind it
will come up with a different card number. A similar thing that could be
done is to adjust the name of the event - currently we add the mangled
pci slot.

Yes.. but what my point was this from your commit message:

"""
The downside is that if a new bind operation is attempted for
the same device/driver without killing the perf process, i915 will fail
to register the pmu (but still load successfully).
"""

So the subsequent bind does not "come up with a different card number". Statement is it will come up with an error if we look at the PMU subset of functionality. I was wondering if there was precedent for that kind of situation.

Mangling the PMU driver name probably also wouldn't be great.

That said, I agree a better approach would be to allow
perf_pmu_unregister() to do its job even when there are open events. On
top of that (or as a way to help achieve that), make perf core replace
the callbacks with stubs when pmu is unregistered - that would even kill
the need for i915's checks on pmu->closed (and fix the lack thereof in
other drivers).

It can be a can of worms though and may be pushed back by perf core
maintainers, so it'd be good have their feedback.

Yeah definitely would be essential.

Regards,

Tvrtko

Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
---
 drivers/gpu/drm/i915/i915_pmu.c | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 8708f905f4f4..df53a8fe53ec 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1158,18 +1158,21 @@ static void free_pmu(struct drm_device *dev, void *res)
     struct i915_pmu *pmu = res;
     struct drm_i915_private *i915 = pmu_to_i915(pmu);
+    perf_pmu_unregister(&pmu->base);
     free_event_attributes(pmu);
     kfree(pmu->base.attr_groups);
     if (IS_DGFX(i915))
         kfree(pmu->name);
+
+    /*
+     * Make sure all currently running (but shortcut on pmu->closed) are
+     * gone before proceeding with free'ing the pmu object embedded in i915.
+     */
+    synchronize_rcu();
 }
 static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
 {
-    struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
-
-    GEM_BUG_ON(!pmu->base.event_init);
-
     /* Select the first online CPU as a designated reader. */
     if (cpumask_empty(&i915_pmu_cpumask))
         cpumask_set_cpu(cpu, &i915_pmu_cpumask);
@@ -1182,8 +1185,6 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
     struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
     unsigned int target = i915_pmu_target_cpu;
-    GEM_BUG_ON(!pmu->base.event_init);
-
     /*
      * Unregistering an instance generates a CPU offline event which we must
      * ignore to avoid incorrectly modifying the shared i915_pmu_cpumask.
@@ -1337,21 +1338,14 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
 {
     struct i915_pmu *pmu = &i915->pmu;
-    if (!pmu->base.event_init)
-        return;
-
     /*
-     * "Disconnect" the PMU callbacks - since all are atomic synchronize_rcu
-     * ensures all currently executing ones will have exited before we
-     * proceed with unregistration.
+     * "Disconnect" the PMU callbacks - unregistering the pmu will be done
+     * later when all currently open events are gone
      */
     pmu->closed = true;
-    synchronize_rcu();
     hrtimer_cancel(&pmu->timer);
-
     i915_pmu_unregister_cpuhp_state(pmu);
-    perf_pmu_unregister(&pmu->base);
     pmu->base.event_init = NULL;
 }