Re: [RFC] [PATCH 0/2] memcg: per cgroup dirty limit

From: Balbir Singh
Date: Mon Feb 22 2010 - 12:36:52 EST


* Vivek Goyal <vgoyal@xxxxxxxxxx> [2010-02-22 09:27:45]:

> On Sun, Feb 21, 2010 at 04:18:43PM +0100, Andrea Righi wrote:
> > Control the maximum amount of dirty pages a cgroup can have at any given time.
> >
> > Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim)
> > page cache used by any cgroup. So, in case of multiple cgroup writers, they
> > will not be able to consume more than their designated share of dirty pages and
> > will be forced to perform write-out if they cross that limit.
> >
> > The overall design is the following:
> >
> > - account dirty pages per cgroup
> > - limit the number of dirty pages via memory.dirty_bytes in cgroupfs
> > - start to write-out in balance_dirty_pages() when the cgroup or global limit
> > is exceeded
> >
> > This feature is supposed to be strictly connected to any underlying IO
> > controller implementation, so we can stop increasing dirty pages in VM layer
> > and enforce a write-out before any cgroup will consume the global amount of
> > dirty pages defined by the /proc/sys/vm/dirty_ratio|dirty_bytes limit.
> >
>
> Thanks Andrea. I had been thinking about looking into it from IO
> controller perspective so that we can control async IO (buffered writes
> also).
>
> Before I dive into patches, two quick things.
>
> - IIRC, last time you had implemented per memory cgroup "dirty_ratio" and
> not "dirty_bytes". Why this change? To begin with either per memcg
> configurable dirty ratio also makes sense? By default it can be the
> global dirty ratio for each cgroup.
>
> - Looks like we will start writeout from memory cgroup once we cross the
> dirty ratio, but still there is no gurantee that we be writting pages
> belonging to cgroup which crossed the dirty ratio and triggered the
> writeout.
>
> This behavior is not very good at least from IO controller perspective
> where if two dd threads are dirtying memory in two cgroups, then if
> one crosses it dirty ratio, it should perform writeouts of its own pages
> and not other cgroups pages. Otherwise we probably will again introduce
> serialization among two writers and will not see service differentation.

I thought that the I/O controller would eventually provide hooks to do
this.. no?

>
> May be we can modify writeback_inodes_wbc() to check first dirty page
> of the inode. And if it does not belong to same memcg as the task who
> is performing balance_dirty_pages(), then skip that inode.

Do you expect all pages of an inode to be paged in by the same cgroup?


--
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/