Re: [PATCH] direct-io: only inc/dec inode->i_dio_count for file systems

From: Andrew Morton
Date: Wed Apr 15 2015 - 15:47:15 EST

On Wed, 15 Apr 2015 13:26:51 -0600 Jens Axboe <axboe@xxxxxx> wrote:

> > Is there similar impact to direct-io-to-file? It would be nice to fix
> > that up also. Many filesystems do something along the lines of
> >
> > atomic_inc(i_dio_count);
> > wibble()
> > atomic_dev(i_dio_count);
> > __blockdev_direct_IO(...);
> >
> > and with your patch I think we could change them to
> >
> > atomic_inc(i_dio_count);
> > wibble()
> > __blockdev_direct_IO(..., flags|DIO_IGNORE_TRUNCATE);
> > atomic_dev(i_dio_count);
> >
> > which would halve the atomic op load.
> I haven't checked pure file, but without extending, I suspect that we
> should see similar benefits there. In any case, it'd make sense doing,
> having twice the atomic inc/dec is just a bad idea in general, if we can
> get rid of it.
> A quick grep doesn't show this use case, or I'm just blind. Where do you
> see that?

btrfs_direct_IO() holds i_dio_count across its call to
__blockdev_direct_IO() for writes. That makes the dio_bio_count
manipulation in do_blockdev_direct_IO() unneeded? ext4 is similar.

Reducing from 4 ops to 2 probably won't make as much difference as
reducing from 2 to 0 - most of the cost comes from initially grabbing that
cacheline from a different CPU.

> > But that's piling hack on top of hack. Can we change the
> I'd more view it as reducing the hack, the real hack is the way that we
> manually do atomic_inc() on i_dio_count, and then call a magic
> inode_dio_done() that decrements it again. It's not very pretty, I'm
> just reducing the scope of the hack :-)

A magic flag which says "you don't need to do this in that case because
I know this inode is special". direct-io already has too much of this :(

> > do_blockdev_direct_IO() interface to "caller shall hold i_mutex, or
> > increment i_dio_count"? ie: exclusion against truncate is wholly the
> > caller's responsibility. That way, this awkward sharing of
> > responsibility between caller and callee gets cleaned up and
> > DIO_IGNORE_TRUNCATE goes away.
> That would clean it up, at the expense of touching more churn. I'd be
> fine with doing it that way.

OK, could be done later I suppose.

> > inode_dio_begin() would be a good place to assert that i_mutex is held,
> > btw.
> How would you check it? If the i_dio_count is bumped, then you'd not
> need to hold i_mutex.

if (atomic_add_return() == 1)

I guess. It was just a thought. Having wandered around the code, I'm
not 100% confident that everyone is holding i_mutex - it's not all
obviously correct.

otoh, the caller doesn't *have* to choose i_mutex for the external
exclusion, and perhaps some callers have used something else.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at