Re: drm_sched run_job and scheduling latency
From: Boris Brezillon
Date: Thu Mar 05 2026 - 04:42:35 EST
Hi Tvrtko,
On Thu, 5 Mar 2026 08:35:33 +0000
Tvrtko Ursulin <tursulin@xxxxxxxxxxx> wrote:
> On 04/03/2026 22:51, Chia-I Wu wrote:
> > Hi,
> >
> > Our system compositor (surfaceflinger on android) submits gpu jobs
> > from a SCHED_FIFO thread to an RT gpu queue. However, because
> > workqueue threads are SCHED_NORMAL, the scheduling latency from submit
> > to run_job can sometimes cause frame misses. We are seeing this on
> > panthor and xe, but the issue should be common to all drm_sched users.
> >
> > Using a WQ_HIGHPRI workqueue helps, but it is still not RT (and won't
> > meet future android requirements). It seems either workqueue needs to
> > gain RT support, or drm_sched needs to support kthread_worker.
> >
> > I know drm_sched switched from kthread_worker to workqueue for better
>
> From a plain kthread actually.
Oops, sorry, I hadn't seen your reply before posting mine. I basically
said the same.
> Anyway, I suggested trying the
> kthread_worker approach a few times in the past but never got round
> implementing it. Not dual paths but simply replacing the workqueues with
> kthread_workers.
>
> What is your thinking regarding how would the priority be configured? In
> terms of the default and mechanism to select a higher priority
> scheduling class.
If we follow the same model that exists today, where the
workqueue can be passed at drm_sched_init() time, it becomes the
driver's responsibility to create a worker of his own with the right
prio set (using sched_setscheduler()). There's still the case where the
worker is NULL, in which case the drm_sched code can probably create
his own worker and leave it with the default prio, just like existed
before the transition to workqueues.
It's a whole different story if you want to deal with worker pools and
do some load balancing though...
Regards,
Boris