Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
From: Daniel Vetter
Date: Thu Jan 21 2016 - 05:09:08 EST
On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> On 21.01.2016 16:58, Daniel Vetter wrote:
> > On Thu, Jan 21, 2016 at 03:41:27PM +0900, Michel Dänzer wrote:
> >> On 21.01.2016 15:38, Michel Dänzer wrote:
> >>> On 21.01.2016 14:31, Mario Kleiner wrote:
> >>>> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
> >>>>> On 21.01.2016 05:32, Mario Kleiner wrote:
> >>>>>>
> >>>>>> So the problem is that AMDs hardware frame counters reset to
> >>>>>> zero during a modeset. The old DRM code dealt with drivers doing that by
> >>>>>> keeping vblank irqs enabled during modesets and incrementing vblank
> >>>>>> count by one during each vblank irq, i think that's what
> >>>>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
> >>>>>
> >>>>> Right, looks like there's been a regression breaking this. I suspect the
> >>>>> problem is that vblank->last isn't getting updated from
> >>>>> drm_vblank_post_modeset. Not sure which change broke that though, or how
> >>>>> to fix it. Ville?
> >>>>>
> >>>>
> >>>> The whole logic has changed and the software counter updates are now
> >>>> driven all the time by the hw counter.
> >>>>
> >>>>>
> >>>>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
> >>>>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
> >>>>> vblank counters"). I've been meaning to track that down since then; one
> >>>>> of these days hopefully, but if anybody has any ideas offhand...
> >>>>
> >>>> I spent the last few hours reading through the drm and radeon code and i
> >>>> think what should probably work is to replace the
> >>>> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
> >>>> calls. These are apparently meant for drivers whose hw counters reset
> >>>> during modeset, [...]
> >>>
> >>> ... just like drm_vblank_pre/post_modeset. That those were broken is a
> >>> regression which needs to be fixed anyway. I don't think switching to
> >>> drm_vblank_on/off is suitable for stable trees.
> >>
> >> Even more so since as I mentioned, there is (has been since at least
> >> about half a year ago) a counter jumping bug with drm_vblank_on/off as well.
> >
> > Hm, never noticed you reported that. I thought the reason for not picking
> > up my drm_vblank_on/off patches was that there's a bug in amdgpu userspace
> > where it tried to use vblank waits on a disabled pipe?
>
> http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html
>
> I don't know why it didn't get picked up.
Yeah, checking my tree your ack is indeed in there. I think I'll resend
them.
> > Can you please point me at the vblank on/off jump bug please?
>
> AFAIR I originally reported it in response to
> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> , but I can't find that in the archives, so maybe that was just on IRC.
> See
> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> . Basically, I ran into the bug fixed by your patch because the counter
> jumped forward on every DPMS off, so it hit the 32-bit boundary after
> just a few days.
Ok, so just uncovered the overflow bug.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch