Am Donnerstag, 9. Januar 2003 17:02 schrieb Brian Tinsley:
>> I've been seeing the exact same thing on the same type of system in the
>> same situations. This has been causing all kinds of problems on our
>> clusters: the system live-locks for a minute or two, causes cluster
>> heartbeats to not be received, and falsely fails over when the system
>> recovers from the live-lock. The only thing I can find after the
>> live-lock is that the runtime for kswapd is abnormally high.
>> We started running sar (60 second collection interval) and were able to
>> capture some stats during this live-lock period. I've snipped some I
>> believe may be of interest. Note the missing stats between 03:59:43 and
>> 04:02:03
>> Oh BTW, this is on a stock 2.4.20 kernel (dual P3, 4GB), but I have seen
>> the same behavior on 2.4.19 and 2.4.17.
On Thu, Jan 09, 2003 at 05:42:51PM +0100, Dieter N?tzel wrote:
> I think you should have cc'ed Andrea Arcangeli <andrea@suse.de>,
> LKM and try 2.4.20-aa1. Are you sure it is a ReiserFS and not a
> kernel thing?
There simply aren't enough scenarios for this to be a mystery. Both
-aa and 2.5.x should have something in there for it: memclass-related
buffer_head stuff in -aa, and bh-stripping + "bh-less" operation (for
ext2) in 2.5.x + fewer (if any) bh's outside of actual dirty data.
Bloat monitoring scripts attached, which might provide somewhat more
useful output to capture, though they certainly don't eliminate the
need for /proc/meminfo logging. I'll also see if some of the accounting
patches can be backported and send those to Marcelo and Andrea.
Bill
This archive was generated by hypermail 2b29 : Wed Jan 15 2003 - 22:00:30 EST