A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVABLE.
As I've said it is quite easy to land at the similar situation even with
tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is
really uncommon. It would be even worse that those would be allowed to
consume both CMA/ZONE_MOVABLE.
IIRC, tmpfs/MAP_ANON|MAP_SHARED memory
a) Is movable, can land in ZONE_MOVABLE/CMA
b) Can be limited by sizing tmpfs appropriately
AFAIK, what you describe is a problem with memory overcommit, not with zone
imbalances (below). Or what am I missing?
It can be problem for both. If you have just too much of shm (do not
forget about MAP_SHARED|MAP_ANON which is much harder to size from an
admin POV) then migrateability doesn't really help because you need a
free memory to migrate. Without reclaimability this can easily become a
problem. That is why I am saying this is not really a new problem.
Swapless systems are not all that uncommon.
One has to be very careful when relying on CMA or movable zones. This is
definitely worth a comment in the kernel command line parameter
documentation. But this is not a new problem.
I see the following thing worth documenting:
Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of
ZONE_MOVABLE/CMA.
Assume you make use of 1.5GB of secretmem. Your system might run into OOM
any time although you still have plenty of memory on ZONE_MOVAVLE (and even
swap!), simply because you are making excessive use of unmovable allocations
(for user space!) in an environment where you should not make excessive use
of unmovable allocations (e.g., where should page tables go?).
yes, you are right of course and I am not really disputing this. But I
would argue that 2:1 Movable/Normal is something to expect problems
already. "Lowmem" allocations can easily trigger OOM even without secret
mem in the picture. It all just takes to allocate a lot of GFP_KERNEL or
even GFP_{HIGH}USER. Really, it is CMA/MOVABLE that are elephant in the
room and one has to be really careful when relying on them.
The existing controls (mlock limit) don't really match the current semantics
of that memory. I repeat it once again: secretmem *currently* resembles
long-term pinned memory, not mlocked memory.
Well, if we had a proper user space pinning accounting then I would
agree that there is a better model to use. But we don't. And previous
attempts to achieve that have failed. So I am afraid that we do not have
much choice left than using mlock as a model.
Things will change when
implementing migration support for secretmem pages. Until then, the
semantics are different and this should be spelled out.
For long-term pinnings this is kind of obvious, still we're now documenting
it because it's dangerous to not be aware of. Secretmem behaves exactly the
same and I think this is worth spelling out: secretmem has the potential of
being used much more often than fairly special vfio/rdma/ ...
yeah I do agree that pinning is a problem for movable/CMA but most
people simply do not care about those. Movable is the thing for hoptlug
and a really weird fragmentation avoidance IIRC and CMA is mostly to
handle crap HW. If those are to be used along with secret mem or
longterm GUP then they will constantly bump into corner cases. Do not
take me wrong, we should be looking at those problems, we should even
document them but I do not see this as anything new. We should probably
have a central place in Documentation explaining all those problems. I
would be even happy to see an explicit note in the tunables - e.g.
configuring movable/normal in 2:1 will get you back to 32b times wrt.
low mem problems.