Re: drm_sched run_job and scheduling latency

Next message: Manivannan Sadhasivam: "Re: [PATCH v7 03/13] PCI: tegra194: Don't force the device into the D0 state before L2"
Previous message: Peter Wang (&#x738B;&#x4FE1;&#x53CB;): "Re: [PATCH v8 14/23] scsi: ufs: mediatek: Remove mediatek,ufs-broken-rtc property"
In reply to: Tvrtko Ursulin: "Re: drm_sched run_job and scheduling latency"
Next in thread: Boris Brezillon: "Re: drm_sched run_job and scheduling latency"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Boris Brezillon

Date: Thu Mar 05 2026 - 04:42:35 EST

Hi Tvrtko,

On Thu, 5 Mar 2026 08:35:33 +0000
Tvrtko Ursulin <tursulin@xxxxxxxxxxx> wrote:

> On 04/03/2026 22:51, Chia-I Wu wrote:
> > Hi,
> >
> > Our system compositor (surfaceflinger on android) submits gpu jobs
> > from a SCHED_FIFO thread to an RT gpu queue. However, because
> > workqueue threads are SCHED_NORMAL, the scheduling latency from submit
> > to run_job can sometimes cause frame misses. We are seeing this on
> > panthor and xe, but the issue should be common to all drm_sched users.
> >
> > Using a WQ_HIGHPRI workqueue helps, but it is still not RT (and won't
> > meet future android requirements). It seems either workqueue needs to
> > gain RT support, or drm_sched needs to support kthread_worker.
> >
> > I know drm_sched switched from kthread_worker to workqueue for better
>
> From a plain kthread actually.

Oops, sorry, I hadn't seen your reply before posting mine. I basically
said the same.

> Anyway, I suggested trying the
> kthread_worker approach a few times in the past but never got round
> implementing it. Not dual paths but simply replacing the workqueues with
> kthread_workers.
>
> What is your thinking regarding how would the priority be configured? In
> terms of the default and mechanism to select a higher priority
> scheduling class.

If we follow the same model that exists today, where the
workqueue can be passed at drm_sched_init() time, it becomes the
driver's responsibility to create a worker of his own with the right
prio set (using sched_setscheduler()). There's still the case where the
worker is NULL, in which case the drm_sched code can probably create
his own worker and leave it with the default prio, just like existed
before the transition to workqueues.

It's a whole different story if you want to deal with worker pools and
do some load balancing though...

Regards,

Boris