Re: [RFC][PATCH 0/8] Critical Page Pool

From: Avi Kivity
Date: Fri Nov 18 2005 - 15:42:26 EST

Next message: Alasdair G Kergon: "[PATCH] device-mapper ioctl: add skip lock_fs flag"
Previous message: linux-os (Dick Johnson): "Re: Does Linux has File Stream mapping support...?"
In reply to: Matthew Dobson: "Re: [RFC][PATCH 0/8] Critical Page Pool"
Next in thread: Paul Jackson: "Re: [RFC][PATCH 0/8] Critical Page Pool"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Matthew Dobson wrote:

Avi Kivity wrote:

1. If you have two subsystems which allocate critical pages, how do you
protect against the condition where one subsystem allocates all the
critical memory, causing the second to oom?

You don't. You make sure that you size the critical pool appropriately for
your workload.

This may not be possible. What if subsystem A depends on subsystem B to do its work, both are critical, and subsystem A allocated all the memory reserve?
If A and B have different allocation thresholds, the deadlock is avoided.

At the very least you need a critical pool per subsystem.

2. There already exists a critical pool: ordinary allocations fail if
free memory is below some limit, but special processes (kswapd) can
allocate that memory by setting PF_MEMALLOC. Perhaps this should be
extended, possibly with a per-process threshold.

The exception for threads with PF_MEMALLOC set is there because those
threads are essentially promising that if the kernel gives them memory,
they will use that memory to free up MORE memory. If we ignore that
promise, and (ab)use the PF_MEMALLOC flag to simply bypass the
zone_watermarks, we'll simply OOM faster, and potentially in situations
that could be avoided (ie: we steal memory that kswapd could have used to
free up more memory).

Sure, but that's just an example of a critical subsystem.

If we introduce yet another mechanism for critical memory allocation, we'll have a hard time making different subsystems, which use different critical allocation mechanisms, play well together.

I propose that instead of a single watermark, there should be a watermark per critical subsystem. The watermarks would be arranged according to the dependency graph, with the depended-on services allowed to go the deepest into the reserves.

(instead of PF_MEMALLOC have a tsk->memory_allocation_threshold, or similar. set it to 0 for kswapd, and for other systems according to taste)

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alasdair G Kergon: "[PATCH] device-mapper ioctl: add skip lock_fs flag"
Previous message: linux-os (Dick Johnson): "Re: Does Linux has File Stream mapping support...?"
In reply to: Matthew Dobson: "Re: [RFC][PATCH 0/8] Critical Page Pool"
Next in thread: Paul Jackson: "Re: [RFC][PATCH 0/8] Critical Page Pool"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]