Re: [PATCH block/for-linus] writeback: fix syncing of I_DIRTY_TIME inodes

From: Dave Chinner
Date: Mon Aug 24 2015 - 18:57:52 EST

On Mon, Aug 24, 2015 at 05:45:35PM -0400, Tejun Heo wrote:
> Hello,
> On Mon, Aug 24, 2015 at 11:09:27PM +0200, Jan Kara wrote:
> > It is inefficient, yes. But note that 'writeback' and 'dirty' states are
> > completely independent. Page can be in any of the !dirty & !writeback,
> That isn't true for pages being dirtied through set_page_dirty().
> It's guaranteed that a dirty inode remains on one of the b_* lists
> till there's no dirty page and writeback is complete.

IO submission calls clear_page_dirty_for_io(), so by the time that
all the pages have been submitted for IO, there are no dirty pages.
IO submission also calls set_page_writeback() once the filesystem
has decided to do IO on the page, and then IO completion calls
end_page_writeback() to clear that state.

IOWs, the page transitions from (dirty && !writeback) before
submission to (!dirty && writeback) after submission, and to (!dirty
&& !writeback) once IO completion occurs.

And you'll note that filemap_fdatawait() blocks on pages with the
PAGECACHE_TAG_WRITEBACK tag set in the mapping tree, not dirty
pages. Hence sync has to wait for all pages to transition out of
writeback before we can consider the inode to be clean.

> > dirty & !writeback, !dirty & writeback, dirty & writeback states. So mixing
> > tracking of writeback and dirty state of an inode just makes the code even
> > messier.
> I'm curious where and why they would deviate. Can you give me some
> examples? AFAICS, anything which uses the usual set_page_dirty() path
> shouldn't do that.

mmaped files.

page_mkwrite dirties page (dirty && !writeback)
writepage clear_page_dirty_for_io (!dirty && !writeback)
writepage starts writeback (!dirty && writeback)
page_mkwrite dirties page (dirty && writeback)
io completes (dirty && !writeback)

This is done so we don't lose dirty state from page faults whilst
the page is under IO and hence have sync miss the page next time

Of course, this behaviour is different if you have a filesystem or
block device that requires stable pages (e.g. btrfs for data CRC
validity). In this case, the page fault will block until the
writeback state goes away...

> > > > a list to track inodes with pages under writeback but they clashed with
> > > > your patch series and they didn't get rebased yet AFAIR.
> > >
> > > Wouldn't it make more sense to simply put them on one of the existing
> > > b_* lists?
> >
> > Logically it just doesn't make sense because as I wrote above dirty and
> > writeback states are completely independent. Also you'd have to detect &
> > skip inodes that don't really have any dirty pages to write and all the
> > detection of "is there any data to write" would get more complicated. A
> > separate list for inodes under writeback as Josef did is IMO the cleanest
> > solution.
> Given that the usual code path tracks dirty and writeback together, I
> don't think it's nonsensical; however, I'm more curious how common
> writeback w/o dirtying case is.

I suspect you've misunderstood the progression here. You can't get
writeback without first going through dirty. But the transition to
writeback clears the dirty page state so that we can capture page
state changes while writeback is in progress.


Dave Chinner
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at