Re: Monthly md check == hung machine; how do I debug?

From: Robin Lee Powell
Date: Tue Feb 05 2008 - 12:11:03 EST


On Mon, Feb 04, 2008 at 09:40:55PM +1100, Nick Piggin wrote:
> On Monday 04 February 2008 08:21, Robin Lee Powell wrote:
> > I've got a machine with a 4 disk SATA raid10 configuration using
> > md. The entire disk is loop-AES encrypted, but that shouldn't
> > matter here.
> >
> > Once a month, Debian runs:
> >
> > /usr/share/mdadm/checkarray --cron --all --quiet
> >
> > and the machine hangs within 30 minutes of that starting.
> >
> > It seems that I can avoid the hang by not having "mdadm
> > --monitor" running, but I'm not certain if that's the case or if
> > I've just been lucky this go-round.
> >
> > I'm on kernel 2.6.23.1, my own compile thereof, x86_64, AMD
> > Athlon(tm) 64 Processor 3700+.
> >
> > I've looked through all the 2.6.23 and 2.6.24 Changelogs, and I
> > can't find anything that looks relevant.
> >
> > So, how can I (help you all) debug this?
>
> Do you have a serial console? Does it respond to pings?

No and yes.

> Can you try to get sysrq+T traces, and sysrq+P traces, and post
> them?

I played with those after you suggested it, but without serial
console had no way to capture them.

I was able to solve the problem, however, like so:

132c133
< # CONFIG_PREEMPT_NONE is not set
---
> CONFIG_PREEMPT_NONE=y
134,135c135,136
< CONFIG_PREEMPT=y
< CONFIG_PREEMPT_BKL=y
---
> # CONFIG_PREEMPT is not set
> # CONFIG_PREEMPT_BKL is not set

-Robin

--
Lojban Reason #17: http://en.wikipedia.org/wiki/Buffalo_buffalo
Proud Supporter of the Singularity Institute - http://singinst.org/
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/