Re: [PATCH] writeback: Don't wait for completion in writeback_inodes_sb_nr

From: Curt Wohlgemuth
Date: Mon Jul 11 2011 - 15:51:20 EST


On Mon, Jul 11, 2011 at 12:48 PM, Jan Kara <jack@xxxxxxx> wrote:
> On Mon 11-07-11 10:11:17, Curt Wohlgemuth wrote:
>> >> One other issue I have with sync as it's structured is that we don't
>> >> do a WB_SYNC_ALL pass on any inode that's only associated with a block
>> >> device, and not on a mounted filesystem.  Blockdev mounts are
>> >> pseudo-mounts, and are explicitly skipped in __sync_filesystem().  So
>> >> if you've written directly to a block device and do a sync, the only
>> >> pass over the pages for this inode are via the
>> >> wakeup_flusher_threads() -- which operates on a BDI, regardless of the
>> >> superblock, and uses WB_SYNC_NONE.
>> >>
>> >> All the sync_filesystem() calls are per-sb, not per-BDI, and they'll
>> >> exclude pseudo-superblocks.
>> >>
>> >> I've seen cases in our modified kernels here at Google in which
>> >> lilo/shutdown failed because of a lack of WB_SYNC_ALL writeback for
>> >> /dev/sda (though I haven't been able to come up with a consistent test
>> >> case, nor reproduce this on an upstream kernel).
>> >  Ho hum, I wonder what would be the cleanest way to fix this. I guess we
>> > could make an exception for 'bdev' superblock in the code but it's ugly.
>>
>> Yes, it's ugly.
>>
>> You could declare that sync(2) is just doing it's job, that it's not
>> designed to write dirty pages from devices that don't have filesystems
>> mounted on them; that's what Christoph seems to be saying, and it's
>> what the man pages for sync(2) that I've seen say as well.  But
>> everybody I've talked to about this is surprised, so you could declare
>> it a bug as well :-) .
>>
>> It seems to me that sys_sync *could* just dispense with iterating over
>> the superblocks, and just iterate over the BDIs, just as
>> wakeup_flusher_threads() does.  I.e., just do two passes over all BDIs
>> -- one with WB_SYNC_NONE, and one with WB_SYNC_ALL.  Well, not quite.
>> It would still have to go to each SB and call the quota_sync and
>> sync_fs callbacks, but those should be cheap.
>  Well, they're not exactly cheap (journaling filesystems have to force
> transaction commits and wait for them) but that does not really matter.
> The real problem is that to wait for IO completion you need some list of
> inodes you want to wait for and you can currently get such list only from
> superblock.

Ah, of course, I realized that some days ago but forgot it --
something that seems to happen more than I'd like to admit to myself.

Thanks,
Curt

>
>> And yes, this would make sys_sync and sys_syncfs more different than
>> they are today.
>
>                                                                Honza
> --
> Jan Kara <jack@xxxxxxx>
> SUSE Labs, CR
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/