Re: [RFC PATCH 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used
From: Joonsoo Kim
Date: Mon May 12 2014 - 21:12:30 EST
On Mon, May 12, 2014 at 10:04:29AM -0700, Laura Abbott wrote:
> Hi,
>
> On 5/7/2014 5:32 PM, Joonsoo Kim wrote:
> > CMA is introduced to provide physically contiguous pages at runtime.
> > For this purpose, it reserves memory at boot time. Although it reserve
> > memory, this reserved memory can be used for movable memory allocation
> > request. This usecase is beneficial to the system that needs this CMA
> > reserved memory infrequently and it is one of main purpose of
> > introducing CMA.
> >
> > But, there is a problem in current implementation. The problem is that
> > it works like as just reserved memory approach. The pages on cma reserved
> > memory are hardly used for movable memory allocation. This is caused by
> > combination of allocation and reclaim policy.
> >
> > The pages on cma reserved memory are allocated if there is no movable
> > memory, that is, as fallback allocation. So the time this fallback
> > allocation is started is under heavy memory pressure. Although it is under
> > memory pressure, movable allocation easily succeed, since there would be
> > many pages on cma reserved memory. But this is not the case for unmovable
> > and reclaimable allocation, because they can't use the pages on cma
> > reserved memory. These allocations regard system's free memory as
> > (free pages - free cma pages) on watermark checking, that is, free
> > unmovable pages + free reclaimable pages + free movable pages. Because
> > we already exhausted movable pages, only free pages we have are unmovable
> > and reclaimable types and this would be really small amount. So watermark
> > checking would be failed. It will wake up kswapd to make enough free
> > memory for unmovable and reclaimable allocation and kswapd will do.
> > So before we fully utilize pages on cma reserved memory, kswapd start to
> > reclaim memory and try to make free memory over the high watermark. This
> > watermark checking by kswapd doesn't take care free cma pages so many
> > movable pages would be reclaimed. After then, we have a lot of movable
> > pages again, so fallback allocation doesn't happen again. To conclude,
> > amount of free memory on meminfo which includes free CMA pages is moving
> > around 512 MB if I reserve 512 MB memory for CMA.
> >
> > I found this problem on following experiment.
> >
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j24
> >
> > CMA reserve: 0 MB 512 MB
> > Elapsed-time: 234.8 361.8
> > Average-MemFree: 283880 KB 530851 KB
> >
> > To solve this problem, I can think following 2 possible solutions.
> > 1. allocate the pages on cma reserved memory first, and if they are
> > exhausted, allocate movable pages.
> > 2. interleaved allocation: try to allocate specific amounts of memory
> > from cma reserved memory and then allocate from free movable memory.
> >
> > I tested #1 approach and found the problem. Although free memory on
> > meminfo can move around low watermark, there is large fluctuation on free
> > memory, because too many pages are reclaimed when kswapd is invoked.
> > Reason for this behaviour is that successive allocated CMA pages are
> > on the LRU list in that order and kswapd reclaim them in same order.
> > These memory doesn't help watermark checking from kwapd, so too many
> > pages are reclaimed, I guess.
> >
>
> We have an out of tree implementation of #1 and so far it's worked for us
> although we weren't looking at the same metrics. I don't completely
> understand the issue you pointed out with #1. It sounds like the issue is
> that CMA pages are already in use by other processes and on LRU lists and
> because the pages are on LRU lists these aren't counted towards the
> watermark by kswapd. Is my understanding correct?
Hello,
Yes, your understanding is correct.
kswapd want to reclaim normal (not CMA) pages, but LRU lists could
have a lot of CMA pages continuously by #1 approach, so watermark
aren't restored easily.
>
> > So, I implement #2 approach.
> > One thing I should note is that we should not change allocation target
> > (movable list or cma) on each allocation attempt, since this prevent
> > allocated pages to be in physically succession, so some I/O devices can
> > be hurt their performance. To solve this, I keep allocation target
> > in at least pageblock_nr_pages attempts and make this number reflect
> > ratio, free pages without free cma pages to free cma pages. With this
> > approach, system works very smoothly and fully utilize the pages on
> > cma reserved memory.
> >
> > Following is the experimental result of this patch.
> >
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > make -j24
> >
> > <Before>
> > CMA reserve: 0 MB 512 MB
> > Elapsed-time: 234.8 361.8
> > Average-MemFree: 283880 KB 530851 KB
> > pswpin: 7 110064
> > pswpout: 452 767502
> >
> > <After>
> > CMA reserve: 0 MB 512 MB
> > Elapsed-time: 234.2 235.6
> > Average-MemFree: 281651 KB 290227 KB
> > pswpin: 8 8
> > pswpout: 430 510
> >
> > There is no difference if we don't have cma reserved memory (0 MB case).
> > But, with cma reserved memory (512 MB case), we fully utilize these
> > reserved memory through this patch and the system behaves like as
> > it doesn't reserve any memory.
>
> What metric are you using to determine all CMA memory was fully used?
> We've been checking /proc/pagetypeinfo
In this result, we can check whether CMA memory was used more or not
by MemFree stat.
I used /proc/zoneinfo to get an insight.
> >
> > With this patch, we aggressively allocate the pages on cma reserved memory
> > so latency of CMA can arise. Below is the experimental result about
> > latency.
> >
> > 4 CPUs, 1024 MB, VIRTUAL MACHINE
> > CMA reserve: 512 MB
> > Backgound Workload: make -jN
> > Real Workload: 8 MB CMA allocation/free 20 times with 5 sec interval
> >
> > N: 1 4 8 16
> > Elapsed-time(Before): 4309.75 9511.09 12276.1 77103.5
> > Elapsed-time(After): 5391.69 16114.1 19380.3 34879.2
> >
> > So generally we can see latency increase. Ratio of this increase
> > is rather big - up to 70%. But, under the heavy workload, it shows
> > latency decrease - up to 55%. This may be worst-case scenario, but
> > reducing it would be important for some system, so, I can say that
> > this patch have advantages and disadvantages in terms of latency.
> >
>
> Do you have any statistics related to failed migration from this? Latency
> and utilization are issues but so is migration success. In the past we've
> found that an increase in CMA utilization was related to increase in CMA
> migration failures because pages were unmigratable. The current
> workaround for this is limiting CMA pages to be used for user processes
> only and not the file cache. Both of these have their own problems.
I have the retrying number when doing 8 MB CMA allocation 20 times.
These number are average of 5 runs.
N: 1 4 8 16
Retrying(Before): 0 0 0.6 12.2
Retrying(After): 1.4 1.8 3 3.6
If you know any permanent failure case with file cache pages, please
let me know.
What I already know CMA migration failure about file cache pages is
the problems related to buffer_head lru, which you mentioned before.
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index fac5509..3ff24d4 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -389,6 +389,12 @@ struct zone {
> > int compact_order_failed;
> > #endif
> >
> > +#ifdef CONFIG_CMA
> > + int has_cma;
> > + int nr_try_cma;
> > + int nr_try_movable;
> > +#endif
> > +
> > ZONE_PADDING(_pad1_)
> >
> > /* Fields commonly accessed by the page reclaim scanner */
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 674ade7..6f2b27b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -788,6 +788,16 @@ void __init __free_pages_bootmem(struct page *page, unsigned int order)
> > }
> >
> > #ifdef CONFIG_CMA
> > +void __init init_alloc_ratio_counter(struct zone *zone)
> > +{
> > + if (zone->has_cma)
> > + return;
> > +
> > + zone->has_cma = 1;
> > + zone->nr_try_movable = 0;
> > + zone->nr_try_cma = 0;
> > +}
> > +
> > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
> > void __init init_cma_reserved_pageblock(struct page *page)
> > {
> > @@ -803,6 +813,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
> > set_pageblock_migratetype(page, MIGRATE_CMA);
> > __free_pages(page, pageblock_order);
> > adjust_managed_page_count(page, pageblock_nr_pages);
> > + init_alloc_ratio_counter(page_zone(page));
> > }
> > #endif
> >
> > @@ -1136,6 +1147,69 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> > return NULL;
> > }
> >
> > +#ifdef CONFIG_CMA
> > +static struct page *__rmqueue_cma(struct zone *zone, unsigned int order,
> > + int migratetype)
> > +{
> > + long free, free_cma, free_wmark;
> > + struct page *page;
> > +
> > + if (migratetype != MIGRATE_MOVABLE || !zone->has_cma)
> > + return NULL;
> > +
> > + if (zone->nr_try_movable)
> > + goto alloc_movable;
> > +
> > +alloc_cma:
> > + if (zone->nr_try_cma) {
> > + /* Okay. Now, we can try to allocate the page from cma region */
> > + zone->nr_try_cma--;
> > + page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> > +
> > + /* CMA pages can vanish through CMA allocation */
> > + if (unlikely(!page && order == 0))
> > + zone->nr_try_cma = 0;
> > +
> > + return page;
> > + }
> > +
> > + /* Reset ratio counter */
> > + free_cma = zone_page_state(zone, NR_FREE_CMA_PAGES);
> > +
> > + /* No cma free pages, so recharge only movable allocation */
> > + if (free_cma <= 0) {
> > + zone->nr_try_movable = pageblock_nr_pages;
> > + goto alloc_movable;
> > + }
> > +
> > + free = zone_page_state(zone, NR_FREE_PAGES);
> > + free_wmark = free - free_cma - high_wmark_pages(zone);
> > +
> > + /*
> > + * free_wmark is below than 0, and it means that normal pages
> > + * are under the pressure, so we recharge only cma allocation.
> > + */
> > + if (free_wmark <= 0) {
> > + zone->nr_try_cma = pageblock_nr_pages;
> > + goto alloc_cma;
> > + }
> > +
> > + if (free_wmark > free_cma) {
> > + zone->nr_try_movable =
> > + (free_wmark * pageblock_nr_pages) / free_cma;
> > + zone->nr_try_cma = pageblock_nr_pages;
> > + } else {
> > + zone->nr_try_movable = pageblock_nr_pages;
> > + zone->nr_try_cma = free_cma * pageblock_nr_pages / free_wmark;
> > + }
> > +
> > + /* Reset complete, start on movable first */
> > +alloc_movable:
> > + zone->nr_try_movable--;
> > + return NULL;
> > +}
> > +#endif
> > +
> > /*
> > * Do the hard work of removing an element from the buddy allocator.
> > * Call me with the zone->lock already held.
> > @@ -1143,10 +1217,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> > static struct page *__rmqueue(struct zone *zone, unsigned int order,
> > int migratetype)
> > {
> > - struct page *page;
> > + struct page *page = NULL;
> > +
> > + if (IS_ENABLED(CONFIG_CMA))
> > + page = __rmqueue_cma(zone, order, migratetype);
> >
> > retry_reserve:
> > - page = __rmqueue_smallest(zone, order, migratetype);
> > + if (!page)
> > + page = __rmqueue_smallest(zone, order, migratetype);
> >
> > if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
> > page = __rmqueue_fallback(zone, order, migratetype);
> > @@ -4849,6 +4927,8 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
> > zone_seqlock_init(zone);
> > zone->zone_pgdat = pgdat;
> > zone_pcp_init(zone);
> > + if (IS_ENABLED(CONFIG_CMA))
> > + zone->has_cma = 0;
> >
> > /* For bootup, initialized properly in watermark setup */
> > mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
> >
>
> I'm going to see about running this through tests internally for comparison.
> Hopefully I'll get useful results in a day or so.
Okay.
I really hope to see your result. :)
Thanks for your interest.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/