Re: [PATCH v6 0/9] memcg: per cgroup dirty page accounting

From: Vivek Goyal
Date: Thu Mar 17 2011 - 13:13:49 EST


On Thu, Mar 17, 2011 at 03:46:41PM +0100, Jan Kara wrote:

[..]
> > - bdi writeback: will revert some of the mmotm memcg dirty limit changes to
> > fs-writeback.c so that wb_do_writeback() will return to checking
> > wb_check_background_flush() to check background limits and being
> > interruptible if
> > sync flush occurs. wb_check_background_flush() will check the global
> > memcg_over_bg_limit list for memcg that are over their dirty limit.
> > wb_writeback() will either (I am not sure):
> > a) scan memcg's bdi_memcg list of inodes (only some of them are dirty)
> > b) scan bdi dirty inode list (only some of them in memcg) using
> > inode_in_memcg() to identify inodes to write. inode_in_memcg(inode,memcg),
> > would walk memcg- -> memcg_bdi -> memcg_mapping to determine if the memcg
> > is caching pages from the inode.
> Hmm, both has its problems. With a) we could queue all the dirty inodes
> from the memcg for writeback but then we'd essentially write all dirty data
> for a memcg, not only enough data to get below bg limit. And if we started
> skipping inodes when memcg(s) inode belongs to get below bg limit, we'd
> risk copying inodes there and back without reason, cases where some inodes
> never get written because they always end up skipped etc. Also the question
> whether some of the memcgs inode belongs to is still over limit is the
> hardest part of solution b) so we wouldn't help ourselves much.

May be I am missing something but can't we just start traversing
through list of memcg_over_bg_list and take option a) to traverse
through list of inodes and write them till we are below limit of
that group. We of course skip inodes which are not dirty.

This is assuming that root group is also part of that list so that
inodes in root group do not starve writeback.

We still continue to have all the inodes on bdi wb structure and
memcg will just give us pointers to those inodes. So for background
write, instead of going serially through dirty inodes list, we
will first pick the cgroup to write and then inode to write. As
we will be doing round robin among cgroup list, it will make sure
that none of the cgroups (including root) as well as inode are not
starved.

What am I missing?

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/