Re: [PATCH 1/2] memcg: coalescing uncharge at unmap and truncation(fixed coimpile bug)

From: KAMEZAWA Hiroyuki
Date: Thu Oct 08 2009 - 19:52:41 EST


On Thu, 8 Oct 2009 15:17:10 -0700
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Fri, 2 Oct 2009 16:02:13 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
>
> >
> > ...
> >
> > In massive parallel enviroment, res_counter can be a performance bottleneck.
> > One strong techinque to reduce lock contention is reducing calls by
> > coalescing some amount of calls into one.
> >
> > Considering charge/uncharge chatacteristic,
> > - charge is done one by one via demand-paging.
> > - uncharge is done by
> > - in chunk at munmap, truncate, exit, execve...
> > - one by one via vmscan/paging.
> >
> > It seems we have a chance in uncharge at unmap/truncation.
> >
> > This patch is a for coalescing uncharge. For avoiding scattering memcg's
> > structure to functions under /mm, this patch adds memcg batch uncharge
> > information to the task.
> >
> > The degree of coalescing depends on callers
> > - at invalidate/trucate... pagevec size
> > - at unmap ....ZAP_BLOCK_SIZE
> > (memory itself will be freed in this degree.)
> > Then, we'll not coalescing too much.
> >
> >
> > ...
> >
> > +static void
> > +__do_uncharge(struct mem_cgroup *mem, const enum charge_type ctype)
> > +{
> > + struct memcg_batch_info *batch = NULL;
> > + bool uncharge_memsw = true;
> > + /* If swapout, usage of swap doesn't decrease */
> > + if (!do_swap_account || ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT)
> > + uncharge_memsw = false;
> > + /*
> > + * do_batch > 0 when unmapping pages or inode invalidate/truncate.
> > + * In those cases, all pages freed continously can be expected to be in
> > + * the same cgroup and we have chance to coalesce uncharges.
> > + * And, we do uncharge one by one if this is killed by OOM.
> > + */
> > + if (!current->memcg_batch.do_batch || test_thread_flag(TIF_MEMDIE))
> > + goto direct_uncharge;
> > +
> > + batch = &current->memcg_batch;
> > + /*
> > + * In usual, we do css_get() when we remember memcg pointer.
> > + * But in this case, we keep res->usage until end of a series of
> > + * uncharges. Then, it's ok to ignore memcg's refcnt.
> > + */
> > + if (!batch->memcg)
> > + batch->memcg = mem;
> > + /*
> > + * In typical case, batch->memcg == mem. This means we can
> > + * merge a series of uncharges to an uncharge of res_counter.
> > + * If not, we uncharge res_counter ony by one.
> > + */
> > + if (batch->memcg != mem)
> > + goto direct_uncharge;
> > + /* remember freed charge and uncharge it later */
> > + batch->pages += PAGE_SIZE;
>
> ->pages is really confusingly named. It doesn't count pages, it counts
> bytes!
>
> We could call it `bytes', but perhaps charge_bytes would be more
> communicative?
>
Ah, I agree. I'll change this "bytes".

> > +/*
> > + * batch_start/batch_end is called in unmap_page_range/invlidate/trucate.
> > + * In that cases, pages are freed continuously and we can expect pages
> > + * are in the same memcg. All these calls itself limits the number of
> > + * pages freed at once, then uncharge_start/end() is called properly.
> > + */
> > +
> > +void mem_cgroup_uncharge_start(void)
> > +{
> > + if (!current->memcg_batch.do_batch) {
> > + current->memcg_batch.memcg = NULL;
> > + current->memcg_batch.pages = 0;
> > + current->memcg_batch.memsw = 0;
>
> what's memsw?
>
Ah, memccontol.c uses "memsw" in several parts.

For example, memory usage interface to user is shown as

memory.usage_in_bytes

memory+swap usage interface to suer is shown as

memory.memsw.usage_in_bytes.

But, Hmm, this is visible from sched.c then...

memsw_bytes or memory_and_swap_bytes ?



> > + }
> > + current->memcg_batch.do_batch++;
> > +}
> > +
> >
> > ...
> >
> > +#ifdef CONFIG_CGROUP_MEM_RES_CTLR /* memcg uses this to do batch job */
> > + struct memcg_batch_info {
> > + int do_batch;
> > + struct mem_cgroup *memcg;
> > + long pages, memsw;
> > + } memcg_batch;
> > +#endif
>
> I find the most valuable documetnation is that which is devoted to the
> data structures. This one didn't get any :(
>
> Negative values of `pages' and `memsw' are meaningless, so it would be
> better to give them an unsigned type. That matches the
> res_counter_charge() expectations also.
>
Agreed. I'll rewrite this part with appropriate comments.
Thank you for pointing out.

Regards,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/