On Tue, Mar 22, 2016 at 03:40:28PM -0600, Jens Axboe wrote:
On 03/22/2016 03:34 PM, Dave Chinner wrote:
On Tue, Mar 22, 2016 at 11:55:16AM -0600, Jens Axboe wrote:
If you call sync, the initial call to wakeup_flusher_threads() ends up
calling wb_start_writeback() with reason=WB_REASON_SYNC, but
wb_start_writeback() always uses WB_SYNC_NONE as the writeback mode.
Ensure that we use WB_SYNC_ALL for a sync operation.
This seems wrong to me. We want background write to happen as
quickly as possible and /not block/ when we first kick sync.
It's not going to block. wakeup_flusher_threads() async queues
writeback work through wb_start_writeback().
The flusher threads block, not the initial wakeup. e.g. they will
now block waiting for data writeback to complete before writing the
inode. i.e. this code in __writeback_single_inode() is now triggered
by the background flusher:
/*
* Make sure to wait on the data before writing out the metadata.
* This is important for filesystems that modify metadata on data
* I/O completion. We don't do it for sync(2) writeback because it has a
* separate, external IO completion path and ->sync_fs for guaranteeing
* inode metadata is written back correctly.
*/
if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync) {
int err = filemap_fdatawait(mapping);
if (ret == 0)
ret = err;
}
It also changes the writeback chunk size in write_cache_pages(), so
instead of doing a bit of writeback from all dirty inodes, it tries
to write everything from each inode in turn (see
writeback_chunk_size()) which will further exacerbate the wait
above.
The latter blocking passes of sync use WB_SYNC_ALL to ensure that we
block waiting for all remaining IO to be issued and waited on, but
the background writeback doesn't need to do this.
That's fine, they can get to wait on the previously issued IO, which
was properly submitted with WB_SYNC_ALL.
Maybe I'm missing your point?
Making the background flusher block and wait for data makes it
completely ineffective in speeding up sync() processing...