Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
From: Mario Kleiner
Date: Wed Jan 20 2016 - 15:32:40 EST
On 01/18/2016 11:49 AM, Vlastimil Babka wrote:
On 01/16/2016 05:24 AM, Mario Kleiner wrote:
On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
I'm currently running...
while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done
... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770 and so far i
can't trigger a hang after hundreds of runs.
Does this also hang for you?
No, test mode seems to be fine.
I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
should probably give useful info around the time of the hang.
Attached. Captured by having kdm running, switching to console, running
"dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see
frozen splashscreen, switch back, terminate dmesg. So somewhere around
the middle there should be where ksplashscreen starts...
Maybe also check XOrg.0.log for (WW) warnings related to flip.
No such warnings there.
thanks,
-mario
Thanks,
Vlastimil
Thanks. So the problem is that AMDs hardware frame counters reset to
zero during a modeset. The old DRM code dealt with drivers doing that by
keeping vblank irqs enabled during modesets and incrementing vblank
count by one during each vblank irq, i think that's what
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
The new code in drm_update_vblank_count() breaks this. The reset of the
counter to zero is treated as counter wraparound, so our software vblank
counter jumps forward by up to 2^24 counts in response (in case of AMD's
24 bit hw counters), and then the vblank event handling code in
drm_handle_vblank_events() and other places detects the counter being
more than 2^23 counts ahead of queued vblank events and as part of its
own wraparound handling for the 32-Bit software counter doesn't deliver
these queued events for a long time -> no vblank swap trigger event ->
no swap -> client hangs waiting for swap completion.
I think i remember seeing the ksplash progress screen occasionally
blanking half way through login, i guess that's when kwin triggers a
modeset in parallel to ksplash doing its OpenGL animations. So depending
on the hw vblank count at the time of login ksplash would or wouldn't
hang, apparently i got "lucky" with my counts at login.
-mario