I've seen the same scenario about 2-3 times a week. kswapd and one or
more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung
on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
The 'hung' processes are consistently immune to kill -9, even as root, which
indicates to me that they're hung inside a kernel call or something.
Sometimes, something *else* will exit, and everything will 'break loose'
and return to normal after a minute or so.
It *may* not be related, but I also have a lot of this in 'dmesg':
__alloc_pages: 4-order allocation failed.
__alloc_pages: 3-order allocation failed.
i810_audio: DMA overrun on send
There was a recent posting re: the i810_audio driver amounting to "I've got
one bug to fix and then I'll put up a patch" for the 'dma overrun' message.
__alloc_pages doesn't give much information on who its caller was, so
that's somewhat of a dead end...
In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will
cause it to loop around and try to get more memory. I'm wondering if
the "hung" process is entering __alloc_pages(), and gets wedged in the
'try_again' loop - which has a call to wakeup_kswapd() inside it, which
would explain the high context-switch rate. I'm not clear on how kswapd
can end up getting stuck and failing to free up something - unless it ends
up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't
enough to get it the memory it needs, causing a deadlock/loop between
kswapd and __alloc_pages/wakeup_kswapd().
Unfortunately, I've just exhausted my ability to debug this one here.. ;)
I'm running the 2.4.3 kernel, with the following patches:
Reiserfs: 2.4.3-3.6.25.quota.bz2
linux-2.4.3-knfsd-6.g.patch.gz
linux-2.4.3-reiserfs-20010327.patch.bz2
IPv6: linux24-2.4.3-usagi-20010406.patch.gz
Crypto: patch-int-2.4.3.1
am using ReiserFS-on-LVM for basically all filesystems, if that matters...
-- Valdis Kletnieks Operating Systems Analyst Virginia Tech
This archive was generated by hypermail 2b29 : Sun Apr 15 2001 - 21:00:18 EST