Re: PROBLEM: bug in e1000 module causes very high CPU load

From: Jesse Brandeburg
Date: Mon Jan 09 2006 - 21:39:48 EST


On 12/23/05, Leroy van Logchem <leroy.vanlogchem@xxxxxxxxx> wrote:
> <snip>
> > Yes, let the server act as usual, it just starts happening out of the blue.
> > No new hardware has been added or removed, no new programs has been
> > installed.
>
> "Me too"

<snip>

> Is there a method which can give hints about what was going on during
> the sharply rising load? My guess is that even monitoring/sampling

well, maybe top, maybe you could schedule sar to gather stats on your system.

> aint doable anymore if the bad situation occurs. Tips on obtaining
> information about subjects like:
> - who was using which tcp/udp connection with what bandwidth

i like a utility like iptraf to help with this.

> - who was generating how many read/writes on which filesystem incl. location

hm, thats a little tougher, nfsstat doesn't give that does it.

> - etc etc.
> are more then welcome too. Does using profile=2 and storing
> readprofile output to files every 5 seconds give enough information to
> tacle the source of this problem?

yes, i think that would certainly help figure out what happens at the TOD :-)
you could enable sysrq in order to get a stack after it hung. For
bonus points you can hook up a serial console and dump the state of
all processes with sysrq.

hopefully before it died you would be able to sync your drive and
reboot in order to maximize your chances of fully writing files.

Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/