2.3.9 bug: uninterruptible processes stuck in __find_lock_page

Kevin Buhr (buhr@stat.wisc.edu)
08 Jul 1999 11:00:16 -0500


Linus:

This is an SMP 2.3.9 kernel on a dual-processor machine. I can't see
anything in pre-patch-2.3.10-5 that would obviously fix this problem,
but the locking changes are too complicated for me to follow.

I have an Emacs process in state TASK_UNINTERRUPTIBLE stuck at the
"schedule()" call in "mm/filemap.c:__find_lock_page". It appears to
never leave this state and accumulates no CPU time, so presumably it's
just that no one's unlocking whatever page it's waiting on. There are
no messages in the kernel ring buffer.

NFS isn't compiled in, so I can eliminate one context in which
"find_lock_page" is used; the others are in "generic_file_write" and
"lookup_swap_cache". When I tried "swapoff -a", the "swapoff" process
got stuck in the same place, so I'm guessing that the culprit is
"lookup_swap_cache": something's locked a swap cache page and isn't
letting it go.

Note that I use a single swapfile on an "ext2" partition.

For what it's worth, after the "swapoff -a", "/proc/meminfo" shows:

total: used: free: shared: buffers: cached:
Mem: 64540672 61157376 3383296 44216320 499712 31244288
Swap: 0 0 0

while "/proc/swaps" shows:

Filename Type Size Used Priority
/usr/SWAPS/swap.0 file 49996 3940 -1

This is certainly what would be expected if "sys_swapoff" had blocked
in "sys_swapoff" -> "try_to_unuse" -> "read_swap_cache_async" ->
"lookup_swap_cache" -> "find_lock_page", since the flags for my
swapfile would be "SWP_USED", so "si_swapinfo" would skip counting it
in generating info for "/proc/meminfo".

Also, Shift-ScrollLock shows:

Mem-info:
Free pages: 1592kB
( Free: 398 (127 254 381)
10*4kB 82*8kB 26*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB
0*2048kB = 1592kB)
Swap cache: add 3131, delete 3033, find 1211/4389
Free swap: 46748kB
16368 pages of RAM
611 reserved pages
5138 pages shared
98 pages swap cached
14 pages in page table cache
Networking buffers in use : 384
Total network buffer allocations : 248247
Total failed network buffer allocs : 0
IP fragment buffer size : 0

I've tried to wade through the swapfile/lock interactions, but I can't
figure it out, so I hope this helps track the problem down.

Kevin <buhr@stat.wisc.edu>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/