Re: [PATCH] Deadlock during heavy write activity to userspace NFSserver on local NFS mount
From: Avi Kivity
Date: Wed Jul 28 2004 - 03:02:17 EST
Nick Piggin wrote:
Avi Kivity wrote:
Nick Piggin wrote:
The solution is that PF_MEMALLOC tasks are allowed to access the
reserve
pool. Dependencies don't matter to this system. It would be your job to
ensure all tasks that might need to allocate memory in order to free
memory have the flag set.
In the general case that's not sufficient. What if the NFS server
wrote to ext3 via the VFS? We might have a ton of ext3 pagecache
waiting for kswapd to reclaim NFS memory, while kswapd is waiting on
the NFS server writing to ext3.
It is sufficient.
You didn't explain your example very well, but I'll assume it is the
following:
dirty NFS data -> NFS server on localhost -> ext3 filesystem.
That's what I meant, sorry for not making it clear.
So kswapd tries to reclaim some memory and writes out the dirty NFS
data. The NFS server then writes this data to ext3 (it can do this
because it is PF_MEMALLOC). The data gets written out, the NFS server
tells the client it is clean, kswapd continues.
Right?
What's stopping the NFS server from ooming the machine then? Every time
some bit of memory becomes free, the server will consume it instantly.
Eventually ext3 will not be able to write anything out because it is out
of memory.
An even more complex case is when ext3 depends on some other process,
say it is mounted on a loopback nbd.
dirty NFS data -> NFS server -> ext3 -> nbd -> nbd server on localhost
-> ext3/raw device
You can't have both the NFS server and the nbd server PF_MEMALLOC, since
the NFS server may consume all memory, then wait for the nbd server to
reclaim.
The solution I have in mind is to replace the sync allocation logic from
if (free_mem() < some_global_limit && !current->PF_MEMALLOC)
wait_for_kswapd()
to
if (free_mem() < current->limit)
wait_for_kswapd()
kswapd would have the lowest ->limit, other processes as their place in
the food chain dictates.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/