RE: drm_cflush_sg() loops for over 3ms - scheduler not running tasks.

From: David Laight
Date: Mon Jan 13 2020 - 12:39:41 EST


From: David Laight
> Sent: 13 January 2020 14:35
>
> I've been looking at why some RT processes don't get scheduled promptly.
> In my test the RT process's affinity ties it to a single cpu (this may not be such
> a good idea as it seems).
>
> What I've found is that the Intel i915 graphics driver uses the 'events_unbound'
> kernel worker thread to periodically execute drm_cflush_sg().
> (see https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/drm_cache.c)
...
> This loop takes about 1us per iteration split fairly evenly between whatever is in
> for_each_sg_page() and drm_cflush_page().
> With a 2560x1440 display the loop count is 3600 (4 bytes/pixel) and the whole
> function takes around 3.3ms.

Actually not setting the cpu affinity makes no difference.
The process is woken up on the cpu it last ran on and sits 'waiting' until
drm_cflush_sg() finishes - even though the other cpu become idle.
No sign of sched_migrate_task event 'stealing' the process.

Even worse, because 'ticket locks' are used no other user processes can
acquire the same (user) mutex or be woken from cv_wait() until the
process actually runs.

This is a 5.4.0-rc7 kernel.
I think I saw some recent scheduler patches, I can try them until I can't build
with gcc 4.7.3 :-)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)