Re: [PATCH 1/3] direct-io: only inc/dec inode->i_dio_count for file systems

From: Dave Chinner
Date: Wed Apr 15 2015 - 18:37:15 EST


On Wed, Apr 15, 2015 at 04:01:36PM -0600, Jens Axboe wrote:
> do_blockdev_direct_IO() increments and decrements the inode
> ->i_dio_count for each IO operation. It does this to protect against
> truncate of a file. Block devices don't need this sort of protection.
>
> For a capable multiqueue setup, this atomic int is the only shared
> state between applications accessing the device for O_DIRECT, and it
> presents a scaling wall for that. In my testing, as much as 30% of
> system time is spent incrementing and decrementing this value. A mixed
> read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
> better latencies too. Before:
.....
> diff --git a/fs/inode.c b/fs/inode.c
> index f00b16f45507..c4901c40ad65 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1946,18 +1946,31 @@ void inode_dio_wait(struct inode *inode)
> EXPORT_SYMBOL(inode_dio_wait);
>
> /*
> - * inode_dio_done - signal finish of a direct I/O requests
> + * inode_dio_begin - signal start of a direct I/O requests
> * @inode: inode the direct I/O happens on
> *
> * This is called once we've finished processing a direct I/O request,
> * and is used to wake up callers waiting for direct I/O to be quiesced.
> */
> -void inode_dio_done(struct inode *inode)
> +void inode_dio_inc(struct inode *inode)

function name does not match docbook comment....

> +{
> + atomic_inc(&inode->i_dio_count);
> +}
> +EXPORT_SYMBOL(inode_dio_inc);
> +
> +/*
> + * inode_dio_dec - signal finish of a direct I/O requests
> + * @inode: inode the direct I/O happens on
> + *
> + * This is called once we've finished processing a direct I/O request,
> + * and is used to wake up callers waiting for direct I/O to be quiesced.
> + */
> +void inode_dio_dec(struct inode *inode)
> {
> if (atomic_dec_and_test(&inode->i_dio_count))
> wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
> }
> -EXPORT_SYMBOL(inode_dio_done);
> +EXPORT_SYMBOL(inode_dio_dec);

Bikeshedding: I think this would be better suited to inode_dio_begin()
and inode_dio_end() because now we are trying to say "this is where
the DIO starts, and this is where it ends". It's not really
"reference counting" interface, we're trying to annotate the
boundaries of where DIO iis protected against truncate....

And, realistically, if we are pushing this up into the filesystems
again, we should push it up into *all* filesystems and get rid of it
completely from the DIO layer. That way no new twisty passages in
the direct IO code are needed.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/