On 07/02/2009 01:54 PM, Jarek Poplawski wrote:I lied.On Thu, Jul 02, 2009 at 01:43:49PM +0200, Andres Freund wrote: ...Well. Waiting for the issue to resolve itself would cost time as wellI will start trying to place the issue by testing with existingIf you can afford your time of course this would be very helpful.
kernels between 2.6.30 and now.
;-) I wont be able to finish this today, but perhaps some reduction
of the search space will be enough.
I placed it between 2.6.30 andOk. I finally see the light. I bisected the issue down to
03347e2592078a90df818670fddf97a33eec70fb (v2.6.30-5415-g03347e2) so
far.
While playing around with netem (time, not packet count based loss-
bursts) I experienced soft lockups several times - to exclude it was
my modifications causing this I recompiled with the original and it
is still locking up. I captured several of those traces via the
thankfully still working netconsole. The simplest policy I could
reproduce the error with was: tc qdisc add dev eth0 root handle 1:
netem delay 10ms loss 0
I could not reproduce the error without delay - but that may only be
a timing issue, as the host I was mainly transferring data to was on
a local network. I could not reproduce the issue on lo.
The time to reproduce the error varied from seconds after executing
tc to several minutes.
Traces 5+6 are made with vanilla
52989765629e7d182b4f146050ebba0abf2cb0b7
The earlier traces are made with parts of my patches applied, and
only included for completeness as I don't believe my modifications
were causing this and all traces are different, so it may give some
clues.
Lockdep was enabled but did not diagnose anything relevant (one dvb
warning during bootup).
Any ideas for debugging?