Does the other machine have the same problems?
It does. It seems to depend on the interrupt frequency : Setting KERNEL_HZ=250
makes it ony appear once a month or so, with KERNEL_HZ=1000, it will
occur within a week. It does happen a lot less with the other machine,
which isn't under disk activity load as much as the other machine.
Are you able to rule out a hardware failure?
Well.. It's too much coincidence that 2 (almost identical) machines show
the same weard behaviour. What strikes me that only *disk* interrupts
after a while don't get handled. The machine itself is alive, just all
disk IO is blocked, which makes it pretty much useless.
Erich, could this be some sort of hardware problem ? I know it's a PITA to
reproduce, but setting CONFIG_HZ to 1000 and bashing the machine with
diskactivity seems to help :)
Regards,
Igmar
--
Igmar Palsenberg
JDI ICT
Zutphensestraatweg 85
6953 CJ Dieren
Tel: +31 (0)313 - 496741
Fax: +31 (0)313 - 420996
The Netherlands
mailto: i.palsenberg@xxxxxxxxxx