Re: [RFC][PATCH 8/8] RSS controller support reclamation

From: Balbir Singh
Date: Fri Nov 10 2006 - 04:17:21 EST


Pavel Emelianov wrote:
> Balbir Singh wrote:
>> Reclaim memory as we hit the max_shares limit. The code for reclamation
>> is inspired from Dave Hansen's challenged memory controller and from the
>> shrink_all_memory() code
>>
>> Reclamation can be triggered from two paths
>>
>> 1. While incrementing the RSS, we hit the limit of the container
>> 2. A container is resized, such that it's new limit is below its current
>> RSS
>>
>> In (1) reclamation takes place in the background.
>
> Hmm... This is not a hard limit in this case, right? And in case
> of overloaded system from the moment reclamation thread is woken
> up till the moment it starts shrinking zones container may touch
> too many pages...
>
> That's not good.

Yes, please see my comments in the TODO's. Hard limits should be easy
to implement, it's a question of calling the correct routine based
on policy.

>
>> TODO's
>>
>> 1. max_shares currently works like a soft limit. The RSS can grow beyond it's
>> limit. One possible fix is to introduce a soft limit (reclaim when the
>> container hits the soft limit) and fail when we hit the hard limit
>
> Such soft limit doesn't help also. It just makes effects on
> low-loaded system smoother.
>
> And what about a hard limit - how would you fail in page fault in
> case of limit hit? SIGKILL/SEGV is not an option - in this case we
> should run synchronous reclamation. This is done in beancounter
> patches v6 we've sent recently.
>

I thought about running synchronous reclamation, but then did not follow
that approach, I was not sure if calling the reclaim routines from the
page fault context is a good thing to do. It's worth trying out, since
it would provide better control over rss.


>> Signed-off-by: Balbir Singh <balbir@xxxxxxxxxx>
>> ---
>>
>> --- linux-2.6.19-rc2/mm/vmscan.c~container-memctlr-reclaim 2006-11-09 22:21:11.000000000 +0530
>> +++ linux-2.6.19-rc2-balbir/mm/vmscan.c 2006-11-09 22:21:11.000000000 +0530
>> @@ -36,6 +36,8 @@
>> #include <linux/rwsem.h>
>> #include <linux/delay.h>
>> #include <linux/kthread.h>
>> +#include <linux/container.h>
>> +#include <linux/memctlr.h>
>>
>> #include <asm/tlbflush.h>
>> #include <asm/div64.h>
>> @@ -65,6 +67,9 @@ struct scan_control {
>> int swappiness;
>>
>> int all_unreclaimable;
>> +
>> + int overlimit;
>> + void *container; /* Added as void * to avoid #ifdef's */
>> };
>>
>> /*
>> @@ -811,6 +816,10 @@ force_reclaim_mapped:
>> cond_resched();
>> page = lru_to_page(&l_hold);
>> list_del(&page->lru);
>> + if (!memctlr_page_reclaim(page, sc->container, sc->overlimit)) {
>> + list_add(&page->lru, &l_active);
>> + continue;
>> + }
>> if (page_mapped(page)) {
>> if (!reclaim_mapped ||
>> (total_swap_pages == 0 && PageAnon(page)) ||
>
> [snip] See comment below.
>
>>
>> +#ifdef CONFIG_RES_GROUPS_MEMORY
>> +/*
>> + * Modelled after shrink_all_memory
>> + */
>> +unsigned long memctlr_shrink_container_memory(unsigned long nr_pages,
>> + struct container *container,
>> + int overlimit)
>> +{
>> + unsigned long lru_pages;
>> + unsigned long ret = 0;
>> + int pass;
>> + struct zone *zone;
>> + struct scan_control sc = {
>> + .gfp_mask = GFP_KERNEL,
>> + .may_swap = 0,
>> + .swap_cluster_max = nr_pages,
>> + .may_writepage = 1,
>> + .swappiness = vm_swappiness,
>> + .overlimit = overlimit,
>> + .container = container,
>> + };
>> +
>
> [snip]
>
>> + for (prio = DEF_PRIORITY; prio >= 0; prio--) {
>> + unsigned long nr_to_scan = nr_pages - ret;
>> +
>> + sc.nr_scanned = 0;
>> + ret += shrink_all_zones(nr_to_scan, prio, pass, &sc);
>> + if (ret >= nr_pages)
>> + break;
>> +
>> + if (sc.nr_scanned && prio < DEF_PRIORITY - 2)
>> + blk_congestion_wait(WRITE, HZ / 10);
>> + }
>> + }
>> + return ret;
>> +}
>> +#endif
>
> Please correct me if I'm wrong, but does this reclamation work like
> "run over all the zones' lists searching for page whose controller
> is sc->container" ?
>

Yeah, that's correct. The code can also reclaim memory from all over-the-limit
containers (by passing SC_OVERLIMIT_ALL). The idea behind using such a scheme
is to ensure that the global LRU list is not broken.


--
Thanks for the feedback,
Balbir Singh,
Linux Technology Center,
IBM Software Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/