Re: [PATCH -mm][preview] memcg: a patch series for next [8/9]

From: KAMEZAWA Hiroyuki
Date: Fri Aug 22 2008 - 07:48:24 EST


On Fri, 22 Aug 2008 19:29:43 +0900
Daisuke Nishimura <nishimura@xxxxxxxxxxxxxxxxx> wrote:

> Hi.
>
> I think you are making updated ones, I send comments so far.
>
Ah, sorry. I just sent one ;(.

> On Tue, 19 Aug 2008 17:44:04 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> > Very experimental...
> >
> > mem+swap controller prototype.
> >
> > This patch adds CONFIG_CGROUP_MEM_RES_CTLR_SWAP as memory resource
> > controller's swap extension.
> >
> > When enabling this, memory resource controller will have 2 limits.
> >
> > - memory.limit_in_bytes .... limit for pages
> > - memory.memsw_limit_in_bytes .... limit for pages + swaps.
> >
> > Following is (easy) accounting state transion after this patch.
> >
> > pages swaps pages_total memsw_total
> > +1 - +1 +1 new page allocation.
> > -1 +1 -1 - swap out.
> > +1 -1 0 - swap in (*).
> > - -1 - -1 swap_free.
> >
> What do you mean by "pages_total"?
>
On memory resource.
The sum of
- mapped anonymous pages
- pages of file cache.
- pages of swap cache.


> > At swap-out, swp_entry will be charged against the cgroup of the page.
> > At swap-in, the page will be charged when it's mapped.
> > (Maybe accounting at read_swap() will be beautiful but we can avoid some of
> > error handling to delay accounting until mem_cgroup_charge().)
> >
> > The charge against swap_entry will be uncharged when swap_entry is freed.
> >
> > The parameter res.swaps just includes swaps not-on swap cache.
> > So, this doesn't show real usage of swp_entry just shows swp_entry on disk.
> >
> IMHO, it would be better to "show" real usage of swp_entry.
> Otherwise, "sum of swap usage of all groups" != "swap usage of
> system shown by meminfo"(but it means adding another counter, hmm...).
>
Yes it means to add another counter. I'd like to try it.

> Instead of showing the usage of disk_swap, how about showing
> the memsw total usage, which is to be limited by user.
>

Yes, I feel the amount of disk_swap is not very useful in my test.
Ok, showing memsw total somewhere.


> > This patch doesn't include codes for control files.
> >
> > TODO:
> > - clean up. and add comments.
> > - support vm_swap_full() under cgroup.
> Is it needed?
>
maybe.

> In my swap controller, swap entries are limited per cgroup.
> So, to make swap_cgroup_charge() fail less frequently,
> vm_swap_full() should be calculated per cgroup so that
> vm can free swap entries in advance.
>
> But I think in mem+swap controller the situation is different.
>
Hmm, I'd like to postpone this until we ends the test.


> > - find easier-to-understand protocol....
> > - check force_empty....(maybe buggy)
> > - support page migration.
> > - test!!
> >
> And,
> - move charge along with task move
yes and moving charge of on-memory-resource should be done, too.

> - hierarchy support
>
> Of course, more basic features and stabilization should be done first.
>
Yes ;)

>
> I agree with this patch as a whole, but I'm worrying about race
> between swapout and swapin about the same entry(I should consider more...).
>
>
swapout/swapin race is guarded by the face I always handle swap-cache.

add_to_swap_cache/delete_from_swap_cache is under lock_page().
and do_swap_page()'s charging is moved under lock_page().

I saw race with force_empty ;(. I hope it's fixed in the latest version.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/