2.6.31 Momentary Hang

From: Danny Cox
Date: Mon Mar 22 2010 - 16:44:18 EST


Kernel Gurus,

A colleague of mine is experiencing severe Denial Of Service multiple
times a day. When a disk intensive process is started, in our case, a
Subversion check out, we observe the load average spike from 6 to 10,
all CPUs are at idle, but the IO wait time is in the thousands of
milliseconds (1500 - 3000). If we wait long enough, the load average
begins to drop, but will hover around 5 for a couple of minutes.
Afterward, it will quickly drop toward 0.

The machine is fairly new, having been purchased in the December
timeframe. It uses an ASUS P7P55D-LE with an Intel Core I7 860 with 8
GB of ram. It is running Ubuntu 9.10 with all patches applied. The
kernel is 2.6.31-20. It uses two WD 500 GB Caviar green drives, with
software RAID1 on 3 of the partitions: /, /boot, and /home.

During the hang time, almost nothing can be started. We've been using
top, atop, vmstat, and the Gnome system monitor to see what's occurring.
Our only hints are the high load average, and the I/O wait times.

At this point, Google has been unable to provide answers, and my
colleague is ready to perform physical violence on the system. I don't
even know what I can or should measure next. Hints are welcome. A
solution would be even better, if any of the above strikes a chord.

One other data point: we have 5+ other identical systems, none of which
have this issue. My colleague notes that his system was fine for a
couple of weeks. It is possible that he installed some package that
causes this behavior. That's merely speculation, of course.

Please include me in the CC: header, as I'm not subscribed to
linux-kernel. I'd like to be, but the volume is too much.

Thanks for your time!

P.S. Things we've tried:

* move the drives to another (identical) machine. The issue persists.

* disable one of the drives in the RAID. The issue persists.


--
Danny Cox
770-236-6148
Cisco
Service Provider Video Technology Group

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/