We've had a similar patch in our tree for a year and a half because
of CMA migration failures, not just for a speedup in allocation
time. I understand that CMA is not the fast case or the general use
case but the problem is that the cost of CMA failure is very high
(complete failure of the feature using CMA). Putting CMA on the PCP
lists means they may be picked up by users who temporarily make the
movable pages unmovable (page cache etc.) which prevents the
allocation from succeeding. The problem still exists even if the CMA
pages are not on the PCP list but the window gets slightly smaller.
I understand that I have seen many people want to use CMA have tweaked
their system to work well and although they do best effort, it doesn't
work well because CMA doesn't gaurantee to succeed in getting free
space since there are lots of hurdle. (get_user_pages, AIO ring buffer,
buffer cache, short of free memory for migration, no swap and so on).
Even, someone want to allocate CMA space with speedy. SIGH.
Yeah, at the moment, CMA is really SUCK.
This really highlights one of the biggest issues with CMA today.
Movable pages make return -EBUSY for any number of reasons. For
non-CMA pages this is mostly fine, another movable page may be
substituted for the movable page that is busy. CMA is a restricted
range though so any failure in that range is very costly because CMA
regions are generally sized exactly for the use cases at hand which
means there is very little extra space for retries.
To make CMA actually usable, we've had to go through and add in
hacks/quirks that prevent CMA from being allocated in any path which
may prevent migration. I've been mixed on if this is the right path
or if the definition of MIGRATE_CMA needs to be changed to be more
restrictive (can't prevent migration).
Fundamental problem is that every subsystem could grab a page anytime
and they doesn't gaurantee to release it soonish or within time CMA
user want so it turns out non-determisitic mess which just hook into
core MM system here and there.
Sometime, I see some people try to solve it case by case with ad-hoc
approach. I guess it would be never ending story as kernel evolves.
I suggest that we could make new wheel with frontswap/cleancache stuff.
The idea is that pages in frontswap/cleancache are evicted from kernel
POV so that we can gaurantee that there is no chance to grab a page
in CMA area and we could remove lots of hook from core MM which just
complicated MM without benefit.
As benefit, cleancache pages could drop easily so it would be fast
to get free space but frontswap cache pages should be move into somewhere.
If there are enough free pages, it could be migrated out there. Optionally
we could compress them. Otherwise, we could pageout them into backed device.
Yeah, it could be slow than migration but at least, we could estimate the time
by storage speed ideally so we could have tunable knob. If someone want
fast CMA, he could control it with ratio of cleancache:frontswap.
IOW, higher frontswap page ratio is, slower the speed would be.
Important thing is admin could have tuned control knob and it gaurantees to
get CMA free space with deterministic time.
As drawback, if we fail to tune the ratio, memeory efficieny would be
bad so that it ends up thrashing but you guys is saying we have been
used CMA without movable fallback which means that it's already static
reserved memory and it's never CMA so you already have lost memory
efficiency and even fail to get a space so I think it's good trade-off
for embedded people.
If anyone has interest the idea, I will move into that.
If it sounds crazy idea, feel free to ignore, please.