Re: [PATCH v2 06/11] drm/panthor: Prepare the scheduler logic for FW events in IRQ context

From: Boris Brezillon

Date: Tue Jun 23 2026 - 08:53:01 EST


On Mon, 22 Jun 2026 14:49:49 +0200
Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote:

> On Wed, 20 May 2026 15:15:54 -0700
> Chia-I Wu <olvaffe@xxxxxxxxx> wrote:
>
> > > > > I collected
> > > > > some numbers with baseline, with this series, and with patch 9
> > > > > reverted at https://gitlab.freedesktop.org/panfrost/linux/-/work_items/85#note_3481308.
> > > > > Reposting the numbers here for reference
> > > > >
> > > > > | | baseline | entire series | patch 9 reverted |
> > > > > | - | - | - | - |
> > > > > | frag job median | 2.8ms | 2.2ms | 2.2ms |
> > > > > | frag job 95% | 4.5ms | 2.8ms | 2.8ms |
> > > > > | frag job 99% | 4.9ms | 2.8ms | 2.8ms |
> > > > > | panthor-job median | 0.8us | 6.2us | 0.9us |
> > > > > | panthor-job 95% | 1.5us | 16.6us | 1.5us |
> > > > > | panthor-job 99% | 1.6us | 28.0us | 1.8us |
> > > >
> > > > panthor-job rows are the durations of the raw irq handlers, collected
> > > > from irq/irq_handler_{entry,exit}.
> > > >
> > > > frag job rows are the durations from frag jobs, collected from
> > > > gpu_scheduler/drm_sched_job_{run,done}.
> > > >
> > > > The fence signaling paths of them are
> > > >
> > > > - baseline: raw handler -> rt threaded handler -> wq job -> wq job ->
> > > > fence signal
> > > > - entire series: raw handler -> fence signal
> > > > - patch 9 reverted: raw handler -> rt threaded handler -> fence signal
> > >
> > > Just did another set of throughput tests, and I confirm the gains are
> > > noticeable only with patch 9 applied (that's on rk3588, which embeds a
> > > G610, so not the exact same setup). As an example, on
> > > gfxbench/gl_manhattan, I get the following score bump 2391 -> 2457.
> > >
> > > Now I need to set things up to measure latency like you did and make
> > > sure I'm observing the same thing: threaded handlers providing roughly
> > > the same latency as hardirq handlers. If not it probably has to do with
> > > some config options that differ and change the preemptability of the
> > > system.
> > >
> > > I'll hold off on the submission of v3 until this is done, because if
> > > threaded handlers are roughly as efficient as hardirq ones, we probably
> > > want to stick to threaded handlers.
>
> Sorry for the delay, I only got back to this on Friday.
>
> So, I've been using ftrace/function-graph with some noinline added to
> get a sense of where most of the time was spent in the hardirq handler
> after the transition to hardirqs, and unlike what I thought, it's not
> coming from the accesses to uncached mappings of the FW
> interface/syncobjs, but instead the various queue[_delayed]_work()
> and/or wake_up_all() on panthor_fw::req_waitqueue. I don't expect us to
> be able to optimize that anytime soon, so I guess we should just keep
> everything in the threaded handler for now and accept the extra delay
> (assuming 20+ usec for the hardirq handler is too long). This also
> means that a lot of the things I do in this series are moot
> (irqsave/restore, using spinlocks instead of mutexes, ...), but before
> I go and rework that, I'd like to get some feedback from Steve and
> Liviu to make sure this is okay with Arm.

I ended up sending a v3 doing that. I can easily go back to the
previous version if needed.