Avi Kivity wrote:This may not be possible. What if subsystem A depends on subsystem B to do its work, both are critical, and subsystem A allocated all the memory reserve?
1. If you have two subsystems which allocate critical pages, how do you
protect against the condition where one subsystem allocates all the
critical memory, causing the second to oom?
You don't. You make sure that you size the critical pool appropriately for
your workload.
Sure, but that's just an example of a critical subsystem.
2. There already exists a critical pool: ordinary allocations fail if
free memory is below some limit, but special processes (kswapd) can
allocate that memory by setting PF_MEMALLOC. Perhaps this should be
extended, possibly with a per-process threshold.
The exception for threads with PF_MEMALLOC set is there because those
threads are essentially promising that if the kernel gives them memory,
they will use that memory to free up MORE memory. If we ignore that
promise, and (ab)use the PF_MEMALLOC flag to simply bypass the
zone_watermarks, we'll simply OOM faster, and potentially in situations
that could be avoided (ie: we steal memory that kswapd could have used to
free up more memory).