Re: [RFC] Shared page accounting for memory cgroup

From: Daisuke Nishimura
Date: Mon Jan 18 2010 - 21:38:37 EST

On Tue, 19 Jan 2010 07:19:42 +0530, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Jan 19, 2010 at 6:52 AM, Daisuke Nishimura
> <nishimura@xxxxxxxxxxxxxxxxx> wrote:
> [snip]
> >> Correct, file cache is almost always considered shared, so it has
> >>
> >> 1. non-private or shared usage of 10MB
> >> 2. 10 MB of file cache
> >>
> >> > I don't think "non private usage" is appropriate to this value.
> >> > Why don't you just show "sum_of_each_process_rss" ? I think it would be easier
> >> > to understand for users.
> >>
> >> Here is my concern
> >>
> >> 1. The gap between looking at memcg stat and sum of all RSS is way
> >> higher in user space
> >> 2. Summing up all rss without walking the tasks atomically can and
> >> will lead to consistency issues. Data can be stale as long as it
> >> represents a consistent snapshot of data
> >>
> >> We need to differentiate between
> >>
> >> 1. Data snapshot (taken at a time, but valid at that point)
> >> 2. Data taken from different sources that does not form a uniform
> >> snapshot, because the timestamping of the each of the collected data
> >> items is different
> >>
> > Hmm, I'm sorry I can't understand why you need "difference".
> > IOW, what can users or middlewares know by the value in the above case
> > (0MB in 01 and 10MB in 02)? I've read this thread, but I can't understande about
> > this point... Why can this value mean some of the groups are "heavy" ?
> >
> Consider a default cgroup that is not root and assume all applications
> move there initially. Now with a lot of shared memory,
> the default cgroup will be the first one to page in a lot of the
> memory and its usage will be very high. Without the concept of
> showing how much is non-private, how does one decide if the default
> cgroup is using a lot of memory or sharing it? How
> do we decide on limits of a cgroup without knowing its actual usage -
> PSS equivalent for a region of memory for a task.
As for limit, I think we should decide it based on the actual usage because
we account and limit the accual usage. Why we should take account of the sum of rss ?
I agree that we'd better not to ignore the sum of rss completely, but could you show me
how the value 0MB/10MB can be used to caluculate the limit in 01/02 in detail ?
I wouldn't argue against you if I could understand the value would be useful,
but I can't understand how the value can be used, so I'm asking :)

Daisuke Nishimura.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at