On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
Hi,
since kernel 4.4 I'm unable to login to kde5 desktop (on openSUSE
Tumbleweed). There's a screen with progressbar showing the startup,
which normally fades away after reaching 100%. But with kernel 4.4, the
progress gets stuck somewhere between 1/2 and 3/4 (not always the same).
Top shows that kwin is using few % of CPU's but mostly sleeps in poll().
When I kill it from another console, I see that everything has actually
started up, just the progressbar screen was obscuring it. The windows
obviously don't have decorations etc. Starting kwin manually again shows
me again the progressbar screen at the same position.
Hmm. Sounds like it could then be waiting for a vblank in the distant
future. There's that 1<<23 limit in the code though, but even with that
we end up with a max wait of ~38 hours assuming a 60Hz refresh rate.
Stuff to try might include enabling drm.debug=0x2f, though that'll
generate a lot of stuff. Another option would be to use the drm vblank
tracepoints to try and catch what seq number it's waiting for and
where we're at currently. Or I suppose you could just hack
up drm_wait_vblank() to print an error message or something if the
requested seq number is in the future by, say, more than a few seconds,
and if that's the case then we could try to figure out why that happens.
I have suspected that kwin is waiting for some event, but nevertheless
tried bisecting the kernel between 4.3 and 4.4, which lead to:
# first bad commit: [4dfd64862ff852df7b1198d667dda778715ee88f] drm: Use
vblank timestamps to guesstimate how many vblanks were missed
I can confirm that 4.4 works if I revert the following commits:
63154ff230fc9255cc507af6277cd181943c50a1 "drm/amdgpu: Fixup hw vblank
counter/ts for new drm_update_vblank_count() (v3)"
d1145ad1e41b6c33758a856163198cb53bb96a50 "drm/radeon: Fixup hw vblank
counter/ts for new drm_update_vblank_count() (v2)"
The sha1s don't seem to match what I have, so not sure which kernel tree
you have, but looking at the radeon commit at least one thing
immediately caught my attention;
+ /* Bump counter if we are at >= leading edge of vblank,
+ * but before vsync where vpos would turn negative and
+ * the hw counter really increments.
+ */
+ if (vpos >= 0)
+ count++;
It's rather hard to see what it's really doing since the custom flags to
the get_scanout_position now cause it return non-standard things. But if
I'm reading things correctly it should really say something like:
if (vpos >= 0 && vpos < (vsync_start - vblank_start))
count++;
Hmm. Actually even that might not be correct since it could be using the
"fake" vblank start here, so might be it'd need to be something like:
if (vpos >= 0 && vpos < (vsync_start - vblank_start + lb_vblank_lead_lines)
count++;
Also might be worth a shot to just ignore the hw frame counter. Eg.:
index e266ffc520d2..db728580549a 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -492,7 +492,6 @@ static struct drm_driver kms_driver = {
.lastclose = radeon_driver_lastclose_kms,
.set_busid = drm_pci_set_busid,
.unload = radeon_driver_unload_kms,
- .get_vblank_counter = radeon_get_vblank_counter_kms,
.enable_vblank = radeon_enable_vblank_kms,
.disable_vblank = radeon_disable_vblank_kms,
.get_vblank_timestamp = radeon_get_vblank_timestamp_kms,
diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c
index 979f3bf65f2c..3c5fcab74152 100644
--- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
@@ -152,11 +152,6 @@ int radeon_driver_irq_postinstall_kms(struct drm_device *dev)
{
struct radeon_device *rdev = dev->dev_private;
- if (ASIC_IS_AVIVO(rdev))
- dev->max_vblank_count = 0x00ffffff;
- else
- dev->max_vblank_count = 0x001fffff;
-
return 0;
}
assuming I'm reading the code correctly.
31ace027c9f1f8e0a2b09bbf961e4db7b1f6cf19 "drm: Don't zero vblank
timestamps from the irq handler"
ac0567a4b132fa66e3edf3f913938af9daf7f916 "drm: Add DRM_DEBUG_VBL()"
4dfd64862ff852df7b1198d667dda778715ee88f "drm: Use vblank timestamps to
guesstimate how many vblanks were missed"
All clean reverts, just needs some fixup on top to use abs() instead of
abs64() due to 79211c8ed19c055ca105502c8733800d442a0ae6.
Unfortunately I don't know if this is a kernel problem or kwin problem.
I tried to CC maintainers of both, advices what to try or what info to
provide welcome. The card is "CAICOS" with 1GB memory.
Thanks,
Vlastimil