Re: [PATCH -V2 4/9] memcg: Add non reclaim resource tracking tomemcg

From: KAMEZAWA Hiroyuki
Date: Thu Mar 08 2012 - 00:58:11 EST


On Sun, 04 Mar 2012 23:37:22 +0530
"Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> wrote:

> On Fri, 2 Mar 2012 17:38:16 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> > On Thu, 1 Mar 2012 14:46:15 +0530
> > "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > > From: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx>

> >
> > > + help
> > > + Add non reclaim resource management to memory resource controller.
> > > + Currently only HugeTLB pages will be managed using this extension.
> > > + The controller limit is enforced during mmap(2), so that
> > > + application can fall back to allocations using smaller page size
> > > + if the memory controller limit prevented them from allocating HugeTLB
> > > + pages.
> > > +
> >
> > Hm. In other thread, KMEM accounting is discussed. There is 2 proposals and
> > - 1st is accounting only reclaimable slabs (as dcache etc.)
> > - 2nd is accounting all slab allocations.
> >
> > Here, 2nd one includes NORECLAIM kmem cache. (Discussion is not ended.)
> >
> > So, for your developments, How about MEM_RES_CTLR_HUGEPAGE ?
>
> Frankly I didn't like the noreclaim name, I also didn't want to indicate
> HUGEPAGE, because the code doesn't make any huge page assumption.

You can add this config for HUGEPAGE interfaces.
Later we can sort out other configs.


> >
> >
> > > config CGROUP_MEM_RES_CTLR_SWAP
> > > bool "Memory Resource Controller Swap Extension"
> > > depends on CGROUP_MEM_RES_CTLR && SWAP
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 6728a7a..b00d028 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -49,6 +49,7 @@
> > > #include <linux/page_cgroup.h>
> > > #include <linux/cpu.h>
> > > #include <linux/oom.h>
> > > +#include <linux/region.h>
> > > #include "internal.h"
> > > #include <net/sock.h>
> > > #include <net/tcp_memcontrol.h>
> > > @@ -214,6 +215,11 @@ static void mem_cgroup_threshold(struct mem_cgroup *memcg);
> > > static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
> > >
> > > /*
> > > + * Currently only hugetlbfs pages are tracked using no reclaim
> > > + * resource count. So we need only MAX_HSTATE res counter
> > > + */
> > > +#define MEMCG_MAX_NORECLAIM HUGE_MAX_HSTATE
> > > +/*
> > > * The memory controller data structure. The memory controller controls both
> > > * page cache and RSS per cgroup. We would eventually like to provide
> > > * statistics based on the statistics developed by Rik Van Riel for clock-pro,
> > > @@ -235,6 +241,11 @@ struct mem_cgroup {
> > > */
> > > struct res_counter memsw;
> > > /*
> > > + * the counter to account for non reclaim resources
> > > + * like hugetlb pages
> > > + */
> > > + struct res_counter no_rcl_res[MEMCG_MAX_NORECLAIM];
> >
> > struct res_counter hugepages;
> >
> > will be ok.
> >
>
> My goal was to make this patch not to mention hugepages, because
> it doesn't really have any depedency on hugepages. That is one of the reason
> for adding MEMCG_MAX_NORECLAIM. Later if we want other in memory file system
> (shmemfs) to limit the resource usage in a similar fashion, we should be
> able to use this memcg changes.
>
> May be for this patchset I can make the changes you suggested and later
> when we want to reuse the code make it more generic ?
>

yes. If there is no user interface change, internal code change will be welcomed.


>
> >
> > > + /*
> > > * Per cgroup active and inactive list, similar to the
> > > * per zone LRU lists.
> > > */
> > > @@ -4887,6 +4898,7 @@ err_cleanup:
> > > static struct cgroup_subsys_state * __ref
> > > mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > > {
> > > + int idx;
> > > struct mem_cgroup *memcg, *parent;
> > > long error = -ENOMEM;
> > > int node;
> > > @@ -4922,6 +4934,10 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > > if (parent && parent->use_hierarchy) {
> > > res_counter_init(&memcg->res, &parent->res);
> > > res_counter_init(&memcg->memsw, &parent->memsw);
> > > + for (idx = 0; idx < MEMCG_MAX_NORECLAIM; idx++) {
> > > + res_counter_init(&memcg->no_rcl_res[idx],
> > > + &parent->no_rcl_res[idx]);
> > > + }
> >
> > You can remove this kinds of loop and keep your implemenation simple.
>
>
> Can you explain this ? How can we remote the loop ?. We want to track
> each huge page size as a seperate resource.
>
Ah, sorry. I miseed it. please ignore.



> > > +long mem_cgroup_try_noreclaim_charge(struct list_head *chg_list,
> > > + unsigned long from, unsigned long to,
> > > + int idx)
> > > +{
> > > + long chg;
> > > + int ret = 0;
> > > + unsigned long csize;
> > > + struct mem_cgroup *memcg;
> > > + struct res_counter *fail_res;
> > > +
> > > + /*
> > > + * Get the task cgroup within rcu_readlock and also
> > > + * get cgroup reference to make sure cgroup destroy won't
> > > + * race with page_charge. We don't allow a cgroup destroy
> > > + * when the cgroup have some charge against it
> > > + */
> > > + rcu_read_lock();
> > > + memcg = mem_cgroup_from_task(current);
> > > + css_get(&memcg->css);
> >
> > css_tryget() ?
> >
>
>
> Why ?
>

current<->cgroup relationship isn't under any locks. So, we do speculative
access with rcu_read_lock() and css_tryget().

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/