Re: mem_cgroup_get_limit() return type, was [patch] memcg: fix unit mismatch in memcg oom limit calculation

From: Greg Thelen
Date: Tue Nov 09 2010 - 17:01:41 EST


David Rientjes <rientjes@xxxxxxxxxx> writes:

> On Tue, 9 Nov 2010, Greg Thelen wrote:
>
>> Johannes Weiner <hannes@xxxxxxxxxxx> writes:
>>
>> > Adding the number of swap pages to the byte limit of a memory control
>> > group makes no sense. Convert the pages to bytes before adding them.
>> >
>> > The only user of this code is the OOM killer, and the way it is used
>> > means that the error results in a higher OOM badness value. Since the
>> > cgroup limit is the same for all tasks in the cgroup, the error should
>> > have no practical impact at the moment.
>> >
>> > But let's not wait for future or changing users to trip over it.
>>
>> Thanks for the fix.
>>
>
> Nice catch, but it's done in the opposite way: the oom killer doesn't use
> byte limits but page limits. So this needs to be
>
> (res_counter_read_u64(&memcg->res, RES_LIMIT) >> PAGE_SHIFT) +
> total_swap_pages;

In -mm, the oom killer queries memcg for a byte limit using
mem_cgroup_get_limit(). The following is from
mem_cgroup_out_of_memory():

limit = mem_cgroup_get_limit(mem) >> PAGE_SHIFT;

So I think the "[patch] memcg: fix unit mismatch in memcg oom limit
calculation" is correct. Although a simpler interface, would involve
changing mem_cgroup_get_limit() to return a page count instead of a byte
count and thus save the oom killer from having to do the conversion:

commit d12e5eded4505a673a7d77d8adab7fce30c7a680
Author: Greg Thelen <gthelen@xxxxxxxxxx>
Date: Tue Nov 9 13:46:38 2010 -0800

memcg: change mem_cgroup_get_limit() return type

The mem_cgroup_get_limit() interface routine returns a
byte count. The only consumer of this data is the oom
killer, which really wants a page count.

This change converts mem_cgroup_get_limit() to return a
page count rather than a byte count. This makes the
memcg interface more consistent with the rest of the mm.
This even makes the memcg interface more consistent. Most other
memcg interface routines operate on page counts, not byte counts.

Signed-off-by: Greg Thelen <gthelen@xxxxxxxxxx>

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 3433784..0a8720e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -163,7 +163,7 @@ unsigned long mem_cgroup_page_stat(struct mem_cgroup *mem,

unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
gfp_t gfp_mask);
-u64 mem_cgroup_get_limit(struct mem_cgroup *mem);
+unsigned long mem_cgroup_get_limit(struct mem_cgroup *mem);

#else /* CONFIG_CGROUP_MEM_RES_CTLR */
struct mem_cgroup;
@@ -368,7 +368,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
}

static inline
-u64 mem_cgroup_get_limit(struct mem_cgroup *mem)
+unsigned long mem_cgroup_get_limit(struct mem_cgroup *mem)
{
return 0;
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7b9ecdc..90efb5d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1569,22 +1569,23 @@ static int mem_cgroup_count_children(struct mem_cgroup *mem)
}

/*
- * Return the memory (and swap, if configured) limit for a memcg.
+ * Return the memory (and swap, if configured) limit for a memcg expressed as
+ * a page count.
*/
-u64 mem_cgroup_get_limit(struct mem_cgroup *memcg)
+unsigned long mem_cgroup_get_limit(struct mem_cgroup *memcg)
{
u64 limit;
u64 memsw;

- limit = res_counter_read_u64(&memcg->res, RES_LIMIT);
- limit += total_swap_pages << PAGE_SHIFT;
+ limit = res_counter_read_u64(&memcg->res, RES_LIMIT) >> PAGE_SHIFT;
+ limit += total_swap_pages;

- memsw = res_counter_read_u64(&memcg->memsw, RES_LIMIT);
+ memsw = res_counter_read_u64(&memcg->memsw, RES_LIMIT) >> PAGE_SHIFT;
/*
* If memsw is finite and limits the amount of swap space available
* to this memcg, return that limit.
*/
- return min(limit, memsw);
+ return min(min(limit, memsw), ULONG_MAX);
}

/*
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7dcca55..9ccc59f 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -538,7 +538,7 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
struct task_struct *p;

check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, 0, NULL);
- limit = mem_cgroup_get_limit(mem) >> PAGE_SHIFT;
+ limit = mem_cgroup_get_limit(mem);
read_lock(&tasklist_lock);
retry:
p = select_bad_process(&points, limit, mem, NULL);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/