Re: [PATCH 10/10] mm: per device dirty threshold

From: Miklos Szeredi
Date: Mon Apr 23 2007 - 02:32:15 EST


> > > > The other deadlock, in throttle_vm_writeout() is still to be solved.
> > >
> > > Let's go back to the original changelog:
> > >
> > > Author: marcelo.tosatti <marcelo.tosatti>
> > > Date: Tue Mar 8 17:25:19 2005 +0000
> > >
> > > [PATCH] vm: pageout throttling
> > >
> > > With silly pageout testcases it is possible to place huge amounts of memory
> > > under I/O. With a large request queue (CFQ uses 8192 requests) it is
> > > possible to place _all_ memory under I/O at the same time.
> > >
> > > This means that all memory is pinned and unreclaimable and the VM gets
> > > upset and goes oom.
> > >
> > > The patch limits the amount of memory which is under pageout writeout to be
> > > a little more than the amount of memory at which balance_dirty_pages()
> > > callers will synchronously throttle.
> > >
> > > This means that heavy pageout activity can starve heavy writeback activity
> > > completely, but heavy writeback activity will not cause starvation of
> > > pageout. Because we don't want a simple `dd' to be causing excessive
> > > latencies in page reclaim.
> > >
> > > Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
> > > Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx>
> > >
> > > (A good one! I wrote it ;))
> > >
> > >
> > > I believe that the combination of dirty-page-tracking and its calls to
> > > balance_dirty_pages() mean that we can now never get more than dirty_ratio
> > > of memory into the dirty-or-writeback condition.
> > >
> > > The vm scanner can convert dirty pages into clean, under-writeback pages,
> > > but it cannot increase the total of dirty+writeback.
> >
> > What about swapout? That can increase the number of writeback pages,
> > without decreasing the number of dirty pages, no?
>
> Could we not solve that by enabling cap_account_writeback on
> swapper_space, and thereby account swap writeback pages. Then the VM
> knows it has outstanding IO and need not panic.

Hmm, I'm not sure that would be right, because then those writeback
pages would be accounted twice: once for swapper_space, and once for
the real device.

So there's a condition, when lots of anonymous pages are turned into
swap-cache writeback pages, and we should somehow throttle this, because

>>> This means that all memory is pinned and unreclaimable and the VM gets
>>> upset and goes oom.

although, it's not quite clear in my mind, how the VM gets upset about
this.

One way to throttle just the swapout activity, is to do the per-bdi
accounting on swapper_space, and limit the number of writeback pages
to e.g. the global threshold + 10%, which is basically what
throttle_vm_writeout() currently does, only now it does it
indiscriminately, and not just on swap writeback pages.

Does this make any sense?

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/