How to debug system hangs (bug #15737) on start of composite WM

From: Maciej Piechotka
Date: Mon Sep 20 2010 - 23:06:53 EST


I have problems debugging system hangs (bug #15737 -
https://bugzilla.kernel.org/show_bug.cgi?id=15737) which occur on
kernels >=2.6.33 (LTS i.e. 2.6.32 is not affected).

I cannot manage to debug them especially it actual results evolve along
with changes in userspace (however 2.6.32 is not affected). On various
stages:

- Second start of composite WM would be either hard hang or will fix
the system (i.e. problem will not be reproduced until next reboot - and
will not be reproduced after hibernation) - recently (for 2 weeks) false
- First start of X will result in soft hang only - true except some
builds of 2.6.35 with relocatable kernel (???) and very old builds
(2.6.33)
- Magic SysRQ key does not work on hard hang
- If second start of WM was successful the DRI after some time will
report bad drawables. It is fixed by reboot - recent (for 2 weeks)
- Switching to console and back helps (i.e. more often there is no
hang) - possibly voodoo magic but it may help
- Error messages about failing to pinpoint buffor (cause?
side-effect?)
- Running X in gdb hangs system (even if it is not on current VT)

Soft hang: X hangs but I can switch to console. Turning off X will stop
it but it takes some time (a minute?). There is nothing in logs (X,
dmesg, system...)
Hard hang: System is not responsive including network, X, sysrq etc.
This is not kernel panic (caps lock and num lock does not blink).

Someone (probably not kernel dev) suggested that it might be connected
with removing BKL.

I am really sorry for spamming but I run out of ideas how to debug this
problem.

Regards

Attachment: signature.asc
Description: This is a digitally signed message part