Nick Piggin wrote:
What's stopping the NFS server from ooming the machine then? Every time some bit of memory becomes free, the server will consume it instantly. Eventually ext3 will not be able to write anything out because it is out of memory.The NFS server should do the writeout a page at a time.
The NFS server writes not only in response to page reclaim (as a local NFS client), but also in response to pressure from non-local clients. If both ext3 and NFS have the same allocation limits, NFS may starve out ext3.
(In my case the NFS server actually writes data asynchronously, so it doesn't really know it is responding to page reclaim, but the problem occurs even in a synchrounous NFS server.)
An even more complex case is when ext3 depends on some other process, say it is mounted on a loopback nbd.The memory allocators will block when memory reaches the reserved
dirty NFS data -> NFS server -> ext3 -> nbd -> nbd server on localhost -> ext3/raw device
You can't have both the NFS server and the nbd server PF_MEMALLOC, since the NFS server may consume all memory, then wait for the nbd server to reclaim.
mark. Page reclaim will ask NFS to free one page, so the server
will write something out to the filesystem, this will cause the nbd
server (also PF_MEMALLOC) to write out to its backing filesystem.
If NFS and nbd have the same limit, then NFS may cause nbd to stall. We've already established that NFS must be PF_MEMALLOC, so nbd must be PF_MEMALLOC_HARDER or something like that.
The solution I have in mind is to replace the sync allocation logic from
if (free_mem() < some_global_limit && !current->PF_MEMALLOC)
wait_for_kswapd()
to
if (free_mem() < current->limit)
wait_for_kswapd()
kswapd would have the lowest ->limit, other processes as their place in the food chain dictates.
I think this is barking up the wrong tree. It really doesn't matter
what process is freeing memory. There isn't really anything special
about the way kswapd frees memory.
To free memory you need (a) to allocate memory (b) possibly wait for some freeing process to make some progress. That means all processes in the freeing chain must be able to allocate at least some memory. If two processes in the chain share the same blocking logic, they may deadlock on each other.