Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer

From: Matthew Brost

Date: Tue Mar 24 2026 - 22:33:38 EST


On Tue, Mar 24, 2026 at 09:06:02AM -0700, Matthew Brost wrote:
> On Tue, Mar 24, 2026 at 10:23:45AM +0100, Boris Brezillon wrote:
> > On Mon, 23 Mar 2026 11:38:06 -0700
> > Matthew Brost <matthew.brost@xxxxxxxxx> wrote:
> >
> > >
> > > Ok, getting stats is easier than I thought...
> > >
> > > ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions /home/mbrost/xe/source/drivers.gpu.i915.igt-gpu-tools/build/tests/xe_exec_threads --r threads-basic
> > >
> > > This test creates one thread per engine instance (7 instances this BMG
> > > device) and submits 1k exec IOCTLs per thread, each performing a DW
> > > write. Each exec IOCTL typically does not have unsignaled input dependencies.
> > >
> > > With IRQ putting of jobs off + no bypass (drm_dep_queue_flags = 0):
> > >
> > > 8,449 context-switches
> > > 412 cpu-migrations
> > > 2,531.43 msec task-clock
> > > 1,847,846,588 cpu_atom/cycles/
> > > 1,847,856,947 cpu_core/cycles/
> > > <not supported> cpu_atom/instructions/
> > > 460,744,020 cpu_core/instructions/
> > >
> > > With IRQ putting of jobs off + bypass (drm_dep_queue_flags =
> > > DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED):
> > >
> > > 8,655 context-switches
> > > 229 cpu-migrations
> > > 2,571.33 msec task-clock
> > > 855,900,607 cpu_atom/cycles/
> > > 855,900,272 cpu_core/cycles/
> > > <not supported> cpu_atom/instructions/
> > > 403,651,469 cpu_core/instructions/
> > >
> > > With IRQ putting of jobs on + bypass (drm_dep_queue_flags =
> > > DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED |
> > > DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE):
> > >
> > > 5,361 context-switches
> > > 169 cpu-migrations
> > > 2,577.44 msec task-clock
> > > 685,769,153 cpu_atom/cycles/
> > > 685,768,407 cpu_core/cycles/
> > > <not supported> cpu_atom/instructions/
> > > 321,336,297 cpu_core/instructions/
> >
> > Thanks for sharing those numbers. For completeness, can you also add the
> > "With IRQ putting of jobs on + no bypass" case?
> >
>
> Yes, I also will share a DRM sched baseline too + I figured out power
> can be measured too - initial results confirm what I expected too - less
> power.
>
> I'm putting together a doc based on running glxgears and another
> benchmark on top Ubuntu 24.10 + Wayland which has explicit sync
> (linux-drm-syncobj, behaves like surfface flinger when rendering flag to
> not pass in fences to draw jobs).
>
> Almost have all the data. Will share here once I have it.
>

Here are some numbers based on glxgears and weston-simple-egl.

5 configurations tested:
DRM sched
DRM dep (no opt flags)
DRM dep + bypass flag
DRM dep + IRQ-safe flag
DRM dep + bypass + IRQ-safe flags

Each configuration was run 3× on both glxgears and weston-simple-egl.
Raptor lake CPU, BMG G21.

Summary:
DRM dep reduces power usage, CPU cycles, and context switches. Enabling
both the bypass and IRQ-safe flags further reduces all of these metrics.

I’d say this test case best models something like scrolling on a phone
or using a laptop for non-GPU-intensive workloads where the screen still
needs to refresh.

I’ve run more intensive benchmarks—glmark2 and Unigine Heaven as well.
The results are somewhat noisy between boots, but I think the same
conclusion holds.

Raw numbers (bit of a firehouse):

DRM sched:
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.565 FPS
300 frames in 5.0 seconds = 60.000 FPS
301 frames in 5.0 seconds = 60.001 FPS

Performance counter stats for 'system wide':

71,548 context-switches
1,466 cpu-migrations
320,440.96 msec task-clock
9,140,249,815 cpu_atom/cycles/
9,140,253,058 cpu_core/cycles/
<not supported> cpu_atom/instructions/
7,071,794,806 cpu_core/instructions/
168.76 Joules power/energy-pkg/
57.78 Joules power/energy-cores/

20.029126614 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.642 FPS
300 frames in 5.0 seconds = 59.988 FPS
301 frames in 5.0 seconds = 60.001 FPS

Performance counter stats for 'system wide':

71,720 context-switches
1,581 cpu-migrations
320,530.64 msec task-clock
8,990,313,521 cpu_atom/cycles/
8,990,315,400 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,988,827,285 cpu_core/instructions/
172.15 Joules power/energy-pkg/
58.33 Joules power/energy-cores/

20.034862844 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.741 FPS
299 frames in 5.0 seconds = 59.798 FPS
299 frames in 5.0 seconds = 59.799 FPS

Performance counter stats for 'system wide':

70,871 context-switches
1,980 cpu-migrations
320,558.82 msec task-clock
8,861,481,467 cpu_atom/cycles/
8,861,485,448 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,665,294,516 cpu_core/instructions/
167.82 Joules power/energy-pkg/
56.97 Joules power/energy-cores/

20.035713155 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

Performance counter stats for 'system wide':

27,398 context-switches
678 cpu-migrations
160,255.17 msec task-clock
5,002,546,782 cpu_atom/cycles/
5,002,549,920 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,498,672,077 cpu_core/instructions/
93.41 Joules power/energy-pkg/
23.91 Joules power/energy-cores/

10.017552274 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

Performance counter stats for 'system wide':

27,322 context-switches
580 cpu-migrations
160,307.12 msec task-clock
4,783,734,059 cpu_atom/cycles/
4,783,737,645 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,224,510,206 cpu_core/instructions/
91.89 Joules power/energy-pkg/
23.28 Joules power/energy-cores/

10.020629190 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

Performance counter stats for 'system wide':

27,356 context-switches
573 cpu-migrations
160,362.30 msec task-clock
5,112,653,847 cpu_atom/cycles/
5,112,658,503 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,395,873,668 cpu_core/instructions/
94.40 Joules power/energy-pkg/
24.58 Joules power/energy-cores/

10.023979647 seconds time elapsed

No opt (drm_dep_queue_flags = 0):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.597 FPS
300 frames in 5.0 seconds = 59.989 FPS
297 frames in 5.0 seconds = 59.232 FPS

Performance counter stats for 'system wide':

66,233 context-switches
1,820 cpu-migrations
320,586.39 msec task-clock
9,028,164,726 cpu_atom/cycles/
9,028,178,052 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,541,478,243 cpu_core/instructions/
178.47 Joules power/energy-pkg/
44.18 Joules power/energy-cores/

20.036849235 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.691 FPS
297 frames in 5.0 seconds = 59.393 FPS
300 frames in 5.0 seconds = 59.803 FPS

Performance counter stats for 'system wide':

68,389 context-switches
2,034 cpu-migrations
320,457.18 msec task-clock
8,736,092,056 cpu_atom/cycles/
8,736,096,958 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,511,630,145 cpu_core/instructions/
183.23 Joules power/energy-pkg/
47.43 Joules power/energy-cores/

20.031469459 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.458 FPS
299 frames in 5.0 seconds = 59.606 FPS
298 frames in 5.0 seconds = 59.590 FPS

Performance counter stats for 'system wide':

67,692 context-switches
1,877 cpu-migrations
320,524.05 msec task-clock
8,837,946,224 cpu_atom/cycles/
8,837,949,628 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,018,812,170 cpu_core/instructions/
187.63 Joules power/energy-pkg/
46.76 Joules power/energy-cores/

20.034428856 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

Performance counter stats for 'system wide':

27,259 context-switches
313 cpu-migrations
160,538.29 msec task-clock
5,079,653,975 cpu_atom/cycles/
5,079,657,432 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,166,877,411 cpu_core/instructions/
90.72 Joules power/energy-pkg/
21.70 Joules power/energy-cores/

10.034716719 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

Performance counter stats for 'system wide':

26,933 context-switches
449 cpu-migrations
160,334.74 msec task-clock
4,851,027,105 cpu_atom/cycles/
4,851,054,678 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,042,177,215 cpu_core/instructions/
87.33 Joules power/energy-pkg/
21.85 Joules power/energy-cores/

10.021873082 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

Performance counter stats for 'system wide':

27,101 context-switches
351 cpu-migrations
160,333.98 msec task-clock
4,903,047,240 cpu_atom/cycles/
4,903,055,111 cpu_core/cycles/
<not supported> cpu_atom/instructions/
2,884,284,727 cpu_core/instructions/
87.68 Joules power/energy-pkg/
21.36 Joules power/energy-cores/

10.021938190 seconds time elapsed

Bypass (drm_dep_queue_flags = DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.718 FPS
299 frames in 5.0 seconds = 59.615 FPS
299 frames in 5.0 seconds = 59.795 FPS

Performance counter stats for 'system wide':

56,788 context-switches
2,576 cpu-migrations
320,610.02 msec task-clock
9,056,383,522 cpu_atom/cycles/
9,056,385,629 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,285,652,796 cpu_core/instructions/
164.29 Joules power/energy-pkg/
44.70 Joules power/energy-cores/

20.041318795 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.734 FPS
300 frames in 5.0 seconds = 59.983 FPS
300 frames in 5.0 seconds = 60.000 FPS

Performance counter stats for 'system wide':

56,388 context-switches
2,326 cpu-migrations
320,581.07 msec task-clock
8,789,215,827 cpu_atom/cycles/
8,789,217,484 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,251,346,200 cpu_core/instructions/
162.67 Joules power/energy-pkg/
44.30 Joules power/energy-cores/

20.037648324 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
305 frames in 5.0 seconds = 60.950 FPS
300 frames in 5.0 seconds = 59.993 FPS
300 frames in 5.0 seconds = 59.806 FPS

Performance counter stats for 'system wide':

56,167 context-switches
2,434 cpu-migrations
320,594.69 msec task-clock
8,700,873,664 cpu_atom/cycles/
8,700,877,150 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,405,556,662 cpu_core/instructions/
162.55 Joules power/energy-pkg/
43.33 Joules power/energy-cores/

20.038448851 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

Performance counter stats for 'system wide':

24,747 context-switches
1,254 cpu-migrations
160,543.42 msec task-clock
5,047,832,024 cpu_atom/cycles/
5,047,823,996 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,124,591,155 cpu_core/instructions/
80.28 Joules power/energy-pkg/
21.49 Joules power/energy-cores/

10.034654628 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

Performance counter stats for 'system wide':

24,953 context-switches
921 cpu-migrations
160,375.32 msec task-clock
5,197,283,835 cpu_atom/cycles/
5,197,287,623 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,393,363,950 cpu_core/instructions/
83.36 Joules power/energy-pkg/
21.92 Joules power/energy-cores/

10.024899366 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
298 frames in 5 seconds: 59.599998 fps

Performance counter stats for 'system wide':

24,576 context-switches
966 cpu-migrations
160,339.37 msec task-clock
4,915,705,971 cpu_atom/cycles/
4,915,709,503 cpu_core/cycles/
<not supported> cpu_atom/instructions/
2,968,947,722 cpu_core/instructions/
79.96 Joules power/energy-pkg/
21.08 Joules power/energy-cores/

10.022743041 seconds time elapsed

IRQ (drm_dep_queue_flags = DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.643 FPS
298 frames in 5.0 seconds = 59.599 FPS
295 frames in 5.0 seconds = 58.998 FPS

Performance counter stats for 'system wide':

60,305 context-switches
1,994 cpu-migrations
320,528.79 msec task-clock
8,518,549,937 cpu_atom/cycles/
8,518,573,906 cpu_core/cycles/
<not supported> cpu_atom/instructions/
5,813,890,066 cpu_core/instructions/
184.52 Joules power/energy-pkg/
40.79 Joules power/energy-cores/

20.032795872 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.759 FPS
299 frames in 5.0 seconds = 59.790 FPS
301 frames in 5.0 seconds = 60.003 FPS

Performance counter stats for 'system wide':

59,401 context-switches
2,256 cpu-migrations
320,475.03 msec task-clock
8,581,759,828 cpu_atom/cycles/
8,581,763,986 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,748,269,548 cpu_core/instructions/
179.76 Joules power/energy-pkg/
40.66 Joules power/energy-cores/

20.029861532 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.653 FPS
298 frames in 5.0 seconds = 59.404 FPS
300 frames in 5.0 seconds = 59.990 FPS

Performance counter stats for 'system wide':

59,381 context-switches
1,800 cpu-migrations
320,616.35 msec task-clock
8,829,473,025 cpu_atom/cycles/
8,829,477,019 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,505,926,710 cpu_core/instructions/
180.38 Joules power/energy-pkg/
40.86 Joules power/energy-cores/

20.040016190 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
298 frames in 5 seconds: 59.599998 fps

Performance counter stats for 'system wide':

27,341 context-switches
786 cpu-migrations
160,478.01 msec task-clock
4,681,440,843 cpu_atom/cycles/
4,681,443,905 cpu_core/cycles/
<not supported> cpu_atom/instructions/
2,969,039,615 cpu_core/instructions/
91.74 Joules power/energy-pkg/
20.84 Joules power/energy-cores/

10.031116623 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

Performance counter stats for 'system wide':

24,626 context-switches
429 cpu-migrations
160,367.44 msec task-clock
4,828,015,355 cpu_atom/cycles/
4,828,019,887 cpu_core/cycles/
<not supported> cpu_atom/instructions/
2,675,419,833 cpu_core/instructions/
90.35 Joules power/energy-pkg/
21.10 Joules power/energy-cores/

10.024476921 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

Performance counter stats for 'system wide':

24,679 context-switches
340 cpu-migrations
160,303.90 msec task-clock
4,500,129,961 cpu_atom/cycles/
4,500,132,697 cpu_core/cycles/
<not supported> cpu_atom/instructions/
2,766,150,592 cpu_core/instructions/
88.01 Joules power/energy-pkg/
19.76 Joules power/energy-cores/

10.019653353 seconds time elapsed

IRQ plus bypass (drm_dep_queue_flags = DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED | DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
305 frames in 5.0 seconds = 60.958 FPS
299 frames in 5.0 seconds = 59.607 FPS
299 frames in 5.0 seconds = 59.603 FPS

Performance counter stats for 'system wide':

46,934 context-switches
1,558 cpu-migrations
320,569.83 msec task-clock
7,976,414,449 cpu_atom/cycles/
7,976,417,934 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,126,973,947 cpu_core/instructions/
178.36 Joules power/energy-pkg/
40.10 Joules power/energy-cores/

20.037681420 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.696 FPS
299 frames in 5.0 seconds = 59.616 FPS
299 frames in 5.0 seconds = 59.781 FPS

Performance counter stats for 'system wide':

47,691 context-switches
1,994 cpu-migrations
320,602.83 msec task-clock
8,270,567,663 cpu_atom/cycles/
8,270,572,484 cpu_core/cycles/
<not supported> cpu_atom/instructions/
4,361,204,861 cpu_core/instructions/
181.56 Joules power/energy-pkg/
40.16 Joules power/energy-cores/

20.038511163 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
305 frames in 5.0 seconds = 60.911 FPS
298 frames in 5.0 seconds = 59.597 FPS
300 frames in 5.0 seconds = 59.803 FPS

Performance counter stats for 'system wide':

47,129 context-switches
1,921 cpu-migrations
320,491.09 msec task-clock
8,054,513,204 cpu_atom/cycles/
8,054,518,711 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,131,796,639 cpu_core/instructions/
178.54 Joules power/energy-pkg/
40.08 Joules power/energy-cores/

20.032444923 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

Performance counter stats for 'system wide':

21,991 context-switches
286 cpu-migrations
160,343.73 msec task-clock
4,497,475,288 cpu_atom/cycles/
4,497,477,011 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,042,007,163 cpu_core/instructions/
89.14 Joules power/energy-pkg/
20.09 Joules power/energy-cores/

10.021642254 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

Performance counter stats for 'system wide':

22,366 context-switches
225 cpu-migrations
160,386.68 msec task-clock
4,398,432,348 cpu_atom/cycles/
4,398,435,205 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,086,156,274 cpu_core/instructions/
89.07 Joules power/energy-pkg/
19.68 Joules power/energy-cores/

10.024827902 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

Performance counter stats for 'system wide':

22,515 context-switches
286 cpu-migrations
160,481.91 msec task-clock
4,447,740,222 cpu_atom/cycles/
4,447,743,314 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,217,285,071 cpu_core/instructions/
90.15 Joules power/energy-pkg/
19.65 Joules power/energy-cores/

10.029135743 seconds time elapsed

Matt

> > I'm a bit surprised by the difference in number of context switches
> > given I'd expect the local-CPU to be picked in priority, and so queuing
> > work items on the same wq from another work item to be almost free in
> > term on scheduling. But I guess there's some load-balancing happening
> > when you execute jobs at such a high rate.
> >
> > Also, I don't know if that's just noise or if it's reproducible, but
> > task-clock seems to be ~40usec lower with the deferred cleanup and
> > no-bypass (higher throughput because you're not blocking the dequeuing
> > of the next job on the cleanup of the previous one, I suspect).
>
> I think that is just noise of what the test is doing in user space -
> that bounces around a bit.
>
> Matt
>
> >