Re: [patch] Revert "memcg: add memory.vmscan_stat"

From: KAMEZAWA Hiroyuki
Date: Tue Aug 30 2011 - 05:03:53 EST


On Tue, 30 Aug 2011 10:42:45 +0200
Johannes Weiner <jweiner@xxxxxxxxxx> wrote:

> On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 30 Aug 2011 09:04:24 +0200
> > Johannes Weiner <jweiner@xxxxxxxxxx> wrote:
> >
> > > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
> > > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
> > > > spin_lock(&memcg->scanstat.lock);
> > > > __mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
> > > > spin_unlock(&memcg->scanstat.lock);
> > > > -
> > > > - memcg = rec->root;
> > > > - spin_lock(&memcg->scanstat.lock);
> > > > - __mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
> > > > - spin_unlock(&memcg->scanstat.lock);
> > > > + cgroup = memcg->css.cgroup;
> > > > + do {
> > > > + spin_lock(&memcg->scanstat.lock);
> > > > + __mem_cgroup_record_scanstat(
> > > > + memcg->scanstat.hierarchy_stats[context], rec);
> > > > + spin_unlock(&memcg->scanstat.lock);
> > > > + if (!cgroup->parent)
> > > > + break;
> > > > + cgroup = cgroup->parent;
> > > > + memcg = mem_cgroup_from_cont(cgroup);
> > > > + } while (memcg->use_hierarchy && memcg != rec->root);
> > >
> > > Okay, so this looks correct, but it sums up all parents after each
> > > memcg scanned, which could have a performance impact. Usually,
> > > hierarchy statistics are only summed up when a user reads them.
> > >
> > Hmm. But sum-at-read doesn't work.
> >
> > Assume 3 cgroups in a hierarchy.
> >
> > A
> > /
> > B
> > /
> > C
> >
> > C's scan contains 3 causes.
> > C's scan caused by limit of A.
> > C's scan caused by limit of B.
> > C's scan caused by limit of C.
> >
> > If we make hierarchy sum at read, we think
> > B's scan_stat = B's scan_stat + C's scan_stat
> > But in precice, this is
> >
> > B's scan_stat = B's scan_stat caused by B +
> > B's scan_stat caused by A +
> > C's scan_stat caused by C +
> > C's scan_stat caused by B +
> > C's scan_stat caused by A.
> >
> > In orignal version.
> > B's scan_stat = B's scan_stat caused by B +
> > C's scan_stat caused by B +
> >
> > After this patch,
> > B's scan_stat = B's scan_stat caused by B +
> > B's scan_stat caused by A +
> > C's scan_stat caused by C +
> > C's scan_stat caused by B +
> > C's scan_stat caused by A.
> >
> > Hmm...removing hierarchy part completely seems fine to me.
>
> I see.
>
> You want to look at A and see whether its limit was responsible for
> reclaim scans in any children. IMO, that is asking the question
> backwards. Instead, there is a cgroup under reclaim and one wants to
> find out the cause for that. Not the other way round.
>
> In my original proposal I suggested differentiating reclaim caused by
> internal pressure (due to own limit) and reclaim caused by
> external/hierarchical pressure (due to limits from parents).
>
> If you want to find out why C is under reclaim, look at its reclaim
> statistics. If the _limit numbers are high, C's limit is the problem.
> If the _hierarchical numbers are high, the problem is B, A, or
> physical memory, so you check B for _limit and _hierarchical as well,
> then move on to A.
>
> Implementing this would be as easy as passing not only the memcg to
> scan (victim) to the reclaim code, but also the memcg /causing/ the
> reclaim (root_mem):
>
> root_mem == victim -> account to victim as _limit
> root_mem != victim -> account to victim as _hierarchical
>
> This would make things much simpler and more natural, both the code
> and the way of tracking down a problem, IMO.
>

hmm. I have no strong opinion.


> > > I don't get why this has to be done completely different from the way
> > > we usually do things, without any justification, whatsoever.
> > >
> > > Why do you want to pass a recording structure down the reclaim stack?
> >
> > Just for reducing number of passed variables.
>
> It's still sitting on bottom of the reclaim stack the whole time.
>
> With my proposal, you would only need to pass the extra root_mem
> pointer.
>

I'm sorry I miss something. Do you say to add a function like

mem_cgroup_record_reclaim_stat(memcg, root_mem, anon_scan, anon_free, anon_rotate,
file_scan, file_free, elapsed_ns)

?

I'll prepare a patch, tomorrow.

Thanks,
-Kame






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/