Re: [PATCH v5 1/2] dax: Don't touch i_dio_count in dax_do_io()

From: Jan Kara
Date: Thu May 05 2016 - 10:16:44 EST


On Fri 29-04-16 12:27:55, Waiman Long wrote:
> The purpose of the i_dio_count is to protect against truncation while
> the I/O operation is in progress. As dax_do_io() only does synchronous
> I/O, the locking performed by the caller or within dax_do_io() for
> read should be enough to protect it against truncation. There is no
> need to touch the i_dio_count.
>
> Eliminating two atomic operations can sometimes give a noticeable
> improvement in I/O performance as NVDIMM is much faster than other
> disk devices.
>
> Suggested-by: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> Signed-off-by: Waiman Long <Waiman.Long@xxxxxxx>

We cannot easily do this currently - the reason is that in several places we
wait for i_dio_count to drop to 0 (look for inode_dio_wait()) while
holding i_mutex to wait for all outstanding DIO / DAX IO. You'd break this
logic with this patch.

If we indeed put all writes under i_mutex, this problem would go away but
as Dave explains in his email, we consciously do as much IO as we can
without i_mutex to allow reasonable scalability of multiple writers into
the same file.

The downside of that is that overwrites and writes vs reads are not atomic
wrt each other as POSIX requires. It has been that way for direct IO in XFS
case for a long time, with DAX this non-conforming behavior is proliferating
more. I agree that's not ideal but serializing all writes on a file is
rather harsh for persistent memory as well...

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR