Re: [PATCH 21/23] mm: per device dirty threshold
From: Peter Zijlstra
Date: Wed Sep 12 2007 - 04:46:19 EST
On Tue, 2007-09-11 at 22:36 -0400, John Stoffel wrote:
> Peter> Scale writeback cache per backing device, proportional to its
> Peter> writeout speed. By decoupling the BDI dirty thresholds a
> Peter> number of problems we currently have will go away, namely:
>
> Ah, this clarifies my questions! Thanks!
>
> Peter> - mutual interference starvation (for any number of BDIs);
> Peter> - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts).
>
> Peter> It might be that all dirty pages are for a single BDI while
> Peter> other BDIs are idling. By giving each BDI a 'fair' share of the
> Peter> dirty limit, each one can have dirty pages outstanding and make
> Peter> progress.
>
> Question, can you change (shrink) the limit on a BDI while it has IO
> in flight? And what will that do to the system? I.e. if you have one
> device doing IO, so that it has a majority of the dirty limit. Then
> another device starts IO, and it's a *faster* device, how
> quickly/slowly does the BDI dirty limits change for both the old and
> new device?
Yes, it can change while in use. A measure of how quickly it can change
is roughly: it can half in a dirty_limit worth of writeout.
What will happen is that those processes doing heavy IO on the slower
device will get throttled more aggressively until its below its new
threshold again - however all the time it will keep on writing at (full)
speed because it will have this backlog to rid itself of, and by doing
that it completes writeouts which ensure it will keep part of the dirty
limit for itself, and thus can always make progress.
You can monitor this by looking at /sys/block/sd*/queue/cache_size while
doing such a thing. It should stabilise quite 'quickly'.
> Peter> A global threshold also creates a deadlock for stacked BDIs;
> Peter> when A writes to B, and A generates enough dirty pages to get
> Peter> throttled, B will never start writeback until the dirty pages
> Peter> go away. Again, by giving each BDI its own 'independent' dirty
> Peter> limit, this problem is avoided.
>
> Peter> So the problem is to determine how to distribute the total
> Peter> dirty limit across the BDIs fairly and efficiently. A DBI that
>
> You mean BDI here, not DBI.
Uhh, yeah, obviously :-)
> Peter> has a large dirty limit but does not have any dirty pages
> Peter> outstanding is a waste.
>
> Peter> What is done is to keep a floating proportion between the DBIs
> Peter> based on writeback completions. This way faster/more active
> Peter> devices get a larger share than slower/idle devices.
>
> Does a slower device get a BDI which is calculated to keep it's limit
> under a certain number of seconds of outstanding IO? This way no
> device can build up more than say 15 seconds of outstanding IO to
> flush at any one time.
Perhaps already answered above, as long as there is dirty stuff to write
out it will keep completing writes and thus gain a stable share of the
dirty limit.
Attachment:
signature.asc
Description: This is a digitally signed message part