Re: [PATCH] mm: memcg: fix over reclaiming mem cgroup

From: Michal Hocko
Date: Mon Jan 23 2012 - 08:02:22 EST


On Sat 21-01-12 22:49:23, Hillf Danton wrote:
> In soft limit reclaim, overreclaim occurs when pages are reclaimed from mem
> group that is under its soft limit, or when more pages are reclaimd than the
> exceeding amount, then performance of reclaimee goes down accordingly.

First of all soft reclaim is more a help for the global memory pressure
balancing rather than any guarantee about how much we reclaim for the
group.
We need to do more changes in order to make it a guarantee.
For example you implementation will cause severe problems when all
cgroups are soft unlimited (default conf.) or when nobody is above the
limit but the total consumption triggers the global reclaim. Therefore
nobody is in excess and you would skip all groups and only bang on the
root memcg.

Ying Han has a patch which basically skips all cgroups which are under
its limit until we reach a certain reclaim priority but even for this we
need some additional changes - e.g. reverse the current default setting
of the soft limit.

Anyway, I like the nr_to_reclaim reduction idea because we have to do
this in some way because the global reclaim starts with ULONG
nr_to_scan.

> A helper function is added to compute the number of pages that exceed the soft
> limit of given mem cgroup, then the excess pages are used when every reclaimee
> is reclaimed to avoid overreclaim.
>
> Signed-off-by: Hillf Danton <dhillf@xxxxxxxxx>
> ---
>
> --- a/mm/memcontrol.c Tue Jan 17 20:41:36 2012
> +++ b/mm/memcontrol.c Sat Jan 21 21:18:46 2012
> @@ -1662,6 +1662,21 @@ static int mem_cgroup_soft_reclaim(struc
> return total;
> }
>
> +unsigned long mem_cgroup_excess_pages(struct mem_cgroup *memcg)
> +{
> + unsigned long pages;
> +
> + if (mem_cgroup_disabled())
> + return 0;
> + if (!memcg)
> + return 0;
> + if (mem_cgroup_is_root(memcg))
> + return 0;
> +
> + pages = res_counter_soft_limit_excess(&memcg->res) >> PAGE_SHIFT;
> + return pages;
> +}
> +
> /*
> * Check OOM-Killer is already running under our hierarchy.
> * If someone is running, return false.
> --- a/mm/vmscan.c Sat Jan 14 14:02:20 2012
> +++ b/mm/vmscan.c Sat Jan 21 21:30:06 2012
> @@ -2150,8 +2150,34 @@ static void shrink_zone(int priority, st
> .mem_cgroup = memcg,
> .zone = zone,
> };
> + unsigned long old;
> + bool clobbered = false;
> +
> + if (memcg != NULL) {
> + unsigned long excess;
> +
> + excess = mem_cgroup_excess_pages(memcg);
> + /*
> + * No bother reclaiming pages from mem cgroup that
> + * is under soft limit
> + */
> + if (!excess)
> + goto next;
> + /*
> + * And reclaim no more pages than excess
> + */
> + if (excess < sc->nr_to_reclaim) {
> + old = sc->nr_to_reclaim;
> + sc->nr_to_reclaim = excess;
> + clobbered = true;
> + }
> + }
>
> shrink_mem_cgroup_zone(priority, &mz, sc);
> +
> + if (clobbered)
> + sc->nr_to_reclaim = old;
> +next:
> /*
> * Limit reclaim has historically picked one memcg and
> * scanned it with decreasing priority levels until
> --- a/include/linux/memcontrol.h Thu Jan 19 22:03:14 2012
> +++ b/include/linux/memcontrol.h Sat Jan 21 21:35:50 2012
> @@ -161,6 +161,7 @@ unsigned long mem_cgroup_soft_limit_recl
> gfp_t gfp_mask,
> unsigned long *total_scanned);
> u64 mem_cgroup_get_limit(struct mem_cgroup *memcg);
> +unsigned long mem_cgroup_excess_pages(struct mem_cgroup *memcg);
>
> void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx);
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> @@ -376,6 +377,11 @@ unsigned long mem_cgroup_soft_limit_recl
>
> static inline
> u64 mem_cgroup_get_limit(struct mem_cgroup *memcg)
> +{
> + return 0;
> +}
> +
> +static inline unsigned long mem_cgroup_excess_pages(struct mem_cgroup *memcg)
> {
> return 0;
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/