Re: [PATCH v6 0/9] memcg: per cgroup dirty page accounting

From: Johannes Weiner
Date: Wed Mar 16 2011 - 17:52:43 EST

On Wed, Mar 16, 2011 at 02:19:26PM -0700, Greg Thelen wrote:
> On Wed, Mar 16, 2011 at 6:13 AM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > On Tue, Mar 15, 2011 at 02:48:39PM -0400, Vivek Goyal wrote:
> >> I think even for background we shall have to implement some kind of logic
> >> where inodes are selected by traversing memcg->lru list so that for
> >> background write we don't end up writting too many inodes from other
> >> root group in an attempt to meet the low background ratio of memcg.
> >>
> >> So to me it boils down to coming up a new inode selection logic for
> >> memcg which can be used both for background as well as foreground
> >> writes. This will make sure we don't end up writting pages from the
> >> inodes we don't want to.
> >
> > Originally for struct page_cgroup reduction, I had the idea of
> > introducing something like
> >
> >        struct memcg_mapping {
> >                struct address_space *mapping;
> >                struct mem_cgroup *memcg;
> >        };
> >
> > hanging off page->mapping to make memcg association no longer per-page
> > and save the pc->memcg linkage (it's not completely per-inode either,
> > multiple memcgs can still refer to a single inode).
> >
> > We could put these descriptors on a per-memcg list and write inodes
> > from this list during memcg-writeback.
> >
> > We would have the option of extending this structure to contain hints
> > as to which subrange of the inode is actually owned by the cgroup, to
> > further narrow writeback to the right pages - iff shared big files
> > become a problem.
> >
> > Does that sound feasible?
> If I understand your memcg_mapping proposal, then each inode could
> have a collection of memcg_mapping objects representing the set of
> memcg that were charged for caching pages of the inode's data. When a
> new file page is charged to a memcg, then the inode's set of
> memcg_mapping would be scanned to determine if current's memcg is
> already in the memcg_mapping set. If this is the first page for the
> memcg within the inode, then a new memcg_mapping would be allocated
> and attached to the inode. The memcg_mapping may be reference counted
> and would be deleted when the last inode page for a particular memcg
> is uncharged.

Dead-on. Well, on which side you put the list - a per-memcg list of
inodes, or a per-inode list of memcgs - really depends on which way
you want to do the lookups. But this is the idea, yes.

> page->mapping = &memcg_mapping
> inode->i_mapping = collection of memcg_mapping, grows/shrinks with [un]charge

If the memcg_mapping list (or hash-table for quick find-or-create?)
was to be on the inode side, I'd put it in struct address_space, since
this is all about page cache, not so much an fs thing.

Still, correct in general.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at