Re: [PATCH 0/7] Per-bdi writeback flusher threads v20

From: Jan Kara
Date: Tue Sep 22 2009 - 07:45:44 EST


On Tue 22-09-09 07:30:55, Chris Mason wrote:
> > Yes a more general solution would help. I'd like to propose one which
> > works in the other way round. In brief,
> > (1) the VFS give a large enough per-file writeback quota to btrfs;
> > (2) btrfs tells VFS "here is a (seek) boundary, stop voluntarily",
> > before exhausting the quota and be force stopped.
> >
> > There will be two limits (the second one is new):
> >
> > - total nr to write in one wb_writeback invocation
> > - _max_ nr to write per file (before switching to sync the next inode)
> >
> > The per-invocation limit is useful for balance_dirty_pages().
> > The per-file number can be accumulated across successive wb_writeback
> > invocations and thus can be much larger (eg. 128MB) than the legacy
> > per-invocation number.
> >
> > The file system will only see the per-file numbers. The "max" means
> > if btrfs find the current page to be the last page in the extent,
> > it could indicate this fact to VFS by setting wbc->would_seek=1. The
> > VFS will then switch to write the next inode.
> >
> > The benefit of early voluntarily yield is, it reduced the possibility
> > to be force stopped half way in an extent. When next time VFS returns
> > to sync this inode, it will again be honored the full 128MB quota,
> > which should be enough to cover a big fresh extent.
>
> This is interesting, but it gets into a problem with defining what a
> seek is. On some hardware they are very fast and don't hurt at all. It
> might be more interesting to make timeslices.
With simple timeslices there's a problem that the time it takes to submit
an IO isn't really related to the time it takes to complete the IO. During
submission we are limited just by availablity of free requests and sizes of
request queues (which might be filled by another thread or by us writing
different inode).
But as I described in my other email, we could probably estimate time it
takes to complete the IO. At least CFQ keeps statistics needed for that. If
we somehow generalized them and put them into BDI, we could probably use
them during writeback...

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/