[RFC] Fine-grained memory priorities and PI

From: Kyle Moffett
Date: Thu Dec 15 2005 - 03:55:28 EST


On Dec 15, 2005, at 03:21, David S. Miller wrote:
Not when we run out, but rather when we reach some low water mark, the "critical sockets" would still use GFP_ATOMIC memory but only "critical sockets" would be allowed to do so.

But even this has faults, consider the IPSEC scenerio I mentioned, and this applies to any kind of encapsulation actually, even simple tunneling examples can be concocted which make the "critical socket" idea fail.

The knee jerk reaction is "mark IPSEC's sockets critical, and mark the tunneling allocations critical, and... and..." well you have GFP_ATOMIC then my friend.

In short, these "seperate page pool" and "critical socket" ideas do not work and we need a different solution, I'm sorry folks spent so much time on them, but they are heavily flawed.

What we really need in the kernel is a more fine-grained memory priority system with PI, similar in concept to what's being done to the scheduler in some of the RT patchsets. Currently we have a very black-and-white memory subsystem; when we go OOM, we just start killing processes until we are no longer OOM. Perhaps we should have some way to pass memory allocation priorities throughout the kernel, including a "this request has X priority", "this request will help free up X pages of RAM", and "drop while dirty under certain OOM to free X memory using this method".

The initial benefit would be that OOM handling would become more reliable and less of a special case. When we start to run low on free pages, it might be OK to kill the SETI@home process long before we OOM if such action might prevent the OOM. Likewise, you might be able to flag certain file pages as being "less critical", such that the kernel can kill a process and drop its dirty pages for files in / tmp. Or the kernel might do a variety of other things just by failing new allocations with low priority and forcing existing allocations with low priority to go away using preregistered handlers.

When processes request memory through any subsystem, their memory priority would be passed through the kernel layers to the allocator, along with any associated information about how to free the memory in a low-memory condition. As a result, I could configure my database to have a much higher priority than SETI@home (or boinc or whatever), so that when the database server wants to fill memory with clean DB cache pages, the kernel will kill SETI@home for it's memory, even if we could just leave some DB cache pages unfaulted.

Questions? Comments? "This is a terrible idea that should never have seen the light of day"? Both constructive and destructive criticism welcomed! (Just please keep the language clean! :-D)

Cheers,
Kyle Moffett

--
Q: Why do programmers confuse Halloween and Christmas?
A: Because OCT 31 == DEC 25.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/