Re: [PATCH v4 16/31] loop: use aio to perform io on the underlyingfile
From: Dave Kleikamp
Date: Mon Dec 03 2012 - 11:59:34 EST
On 11/22/2012 05:06 PM, Dave Chinner wrote:
> On Wed, Nov 21, 2012 at 04:40:56PM -0600, Dave Kleikamp wrote:
>> From: Zach Brown <zab@xxxxxxxxx>
>>
>> This uses the new kernel aio interface to process loopback IO by
>> submitting concurrent direct aio. Previously loop's IO was serialized
>> by synchronous processing in a thread.
>>
>> The aio operations specify the memory for the IO with the bio_vec arrays
>> directly instead of mappings of the pages.
>>
>> The use of aio operations is enabled when the backing file supports the
>> read_iter and write_iter methods. These methods must only be added when
>> O_DIRECT on bio_vecs is supported.
>>
>> Signed-off-by: Dave Kleikamp <dave.kleikamp@xxxxxxxxxx>
>> Cc: Zach Brown <zab@xxxxxxxxx>
>
> I suspect aio iocompetion here doesn't work for FUA write IO.
Since the underlying file system is doing direct IO, we at least know
that the IO has been performed to the underlying device. It could still
be in the devices write cache, so maybe an fsync is still needed. It
wouldn't be hard to fix if that's the right thing to do.
>> +#ifdef CONFIG_AIO
>> +static void lo_rw_aio_complete(u64 data, long res)
>> +{
>> + struct bio *bio = (struct bio *)(uintptr_t)data;
>> +
>> + if (res > 0)
>> + res = 0;
>> + else if (res < 0)
>> + res = -EIO;
>> +
>> + bio_endio(bio, res);
>> +}
>
> This effectively does nothing...
It passes the IO completion from the underlying device to the caller.
>
>> @@ -413,37 +456,6 @@ static int do_bio_filebacked(struct loop_device *lo, struct bio *bio)
>> if (bio_rw(bio) == WRITE) {
>> struct file *file = lo->lo_backing_file;
>>
>> - if (bio->bi_rw & REQ_FLUSH) {
>> - ret = vfs_fsync(file, 0);
>> - if (unlikely(ret && ret != -EINVAL)) {
>> - ret = -EIO;
>> - goto out;
>> - }
>> - }
>> -
>> - /*
>> - * We use punch hole to reclaim the free space used by the
>> - * image a.k.a. discard. However we do not support discard if
>> - * encryption is enabled, because it may give an attacker
>> - * useful information.
>> - */
>> - if (bio->bi_rw & REQ_DISCARD) {
>> - struct file *file = lo->lo_backing_file;
>> - int mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
>> -
>> - if ((!file->f_op->fallocate) ||
>> - lo->lo_encrypt_key_size) {
>> - ret = -EOPNOTSUPP;
>> - goto out;
>> - }
>> - ret = file->f_op->fallocate(file, mode, pos,
>> - bio->bi_size);
>> - if (unlikely(ret && ret != -EINVAL &&
>> - ret != -EOPNOTSUPP))
>> - ret = -EIO;
>> - goto out;
>> - }
>> -
>> ret = lo_send(lo, bio, pos);
>>
>> if ((bio->bi_rw & REQ_FUA) && !ret) {
>
> And as you can see here that after writing the data in the filebacked
> path, there's special handling for REQ_FUA (i.e. another fsync).
> ....
In this path, the data may still be in the underlying filesystem's page
cache.
>
>> @@ -512,7 +546,29 @@ static inline void loop_handle_bio(struct loop_device *lo, struct bio *bio)
>> do_loop_switch(lo, bio->bi_private);
>> bio_put(bio);
>> } else {
>> - int ret = do_bio_filebacked(lo, bio);
>> + int ret;
>> +
>> + if (bio_rw(bio) == WRITE) {
>> + if (bio->bi_rw & REQ_FLUSH) {
>> + ret = vfs_fsync(lo->lo_backing_file, 1);
>> + if (unlikely(ret && ret != -EINVAL))
>> + goto out;
>> + }
>> + if (bio->bi_rw & REQ_DISCARD) {
>> + ret = lo_discard(lo, bio);
>> + goto out;
>> + }
>> + }
>> +#ifdef CONFIG_AIO
>> + if (lo->lo_flags & LO_FLAGS_USE_AIO &&
>> + lo->transfer == transfer_none) {
>> + ret = lo_rw_aio(lo, bio);
>> + if (ret == 0)
>> + return;
>> + } else
>> +#endif
>> + ret = do_bio_filebacked(lo, bio);
>> +out:
>
> And this extra fsync is now not done in the aio path. I.e. the AIO
> completion path needs to issue the fsync to maintain correct REQ_FUA
> semantics...
If this is really necessary, I can fix it.
>
> Cheers,
>
> Dave.
Thanks,
Shaggy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/