Re: Monthly md check == hung machine; how do I debug?

From: Neil Brown
Date: Tue Feb 05 2008 - 15:28:24 EST


On Tuesday February 5, rlpowell@xxxxxxxxxxxxxxxxxx wrote:
>
> I was able to solve the problem, however, like so:
>
> 132c133
> < # CONFIG_PREEMPT_NONE is not set
> ---
> > CONFIG_PREEMPT_NONE=y
> 134,135c135,136
> < CONFIG_PREEMPT=y
> < CONFIG_PREEMPT_BKL=y
> ---
> > # CONFIG_PREEMPT is not set
> > # CONFIG_PREEMPT_BKL is not set
>

This suggests that there is some sort of race.
Given that I've never hit it on SMP machines, it is probably a very
small window that opens immediately after some event that triggers
kernel preemption.

The only "mdadm --monitor" does in the kernel is read /proc/mdstat and
maybe make some GET_ARRAY_INFO/ GET_DISK_INFO ioctl calls.

They don't do much more than grab the reconfig_mutex.....

What sort of hardware do you have? x86? SMP or uni-processor?
Also, exactly what kernel are you running?

I might see if I can reproduce it... so if you can send me the broken
.config, that might help too.

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/