Re: [PATCH] direct-io: only inc/dec inode->i_dio_count for file systems

From: Jens Axboe
Date: Wed Apr 15 2015 - 16:09:21 EST


On 04/15/2015 01:46 PM, Andrew Morton wrote:
On Wed, 15 Apr 2015 13:26:51 -0600 Jens Axboe <axboe@xxxxxx> wrote:

Is there similar impact to direct-io-to-file? It would be nice to fix
that up also. Many filesystems do something along the lines of

atomic_inc(i_dio_count);
wibble()
atomic_dev(i_dio_count);
__blockdev_direct_IO(...);

and with your patch I think we could change them to

atomic_inc(i_dio_count);
wibble()
__blockdev_direct_IO(..., flags|DIO_IGNORE_TRUNCATE);
atomic_dev(i_dio_count);

which would halve the atomic op load.

I haven't checked pure file, but without extending, I suspect that we
should see similar benefits there. In any case, it'd make sense doing,
having twice the atomic inc/dec is just a bad idea in general, if we can
get rid of it.

A quick grep doesn't show this use case, or I'm just blind. Where do you
see that?

btrfs_direct_IO() holds i_dio_count across its call to
__blockdev_direct_IO() for writes. That makes the dio_bio_count
manipulation in do_blockdev_direct_IO() unneeded? ext4 is similar.

You are right, I even modified those.. Yes, that looks like it's unnecessary. Chris?

Reducing from 4 ops to 2 probably won't make as much difference as
reducing from 2 to 0 - most of the cost comes from initially grabbing that
cacheline from a different CPU.

It wont be a 50% reduction, but it all really depends a lot on timing. If you have enough banging on it, it could be close to a 50% reduction.


But that's piling hack on top of hack. Can we change the

I'd more view it as reducing the hack, the real hack is the way that we
manually do atomic_inc() on i_dio_count, and then call a magic
inode_dio_done() that decrements it again. It's not very pretty, I'm
just reducing the scope of the hack :-)

A magic flag which says "you don't need to do this in that case because
I know this inode is special". direct-io already has too much of this :(

Well, outside of rewriting the dio code, that's what we get. The flags already exist, and I don't disagree with you that the situation is generally a mess.

do_blockdev_direct_IO() interface to "caller shall hold i_mutex, or
increment i_dio_count"? ie: exclusion against truncate is wholly the
caller's responsibility. That way, this awkward sharing of
responsibility between caller and callee gets cleaned up and
DIO_IGNORE_TRUNCATE goes away.

That would clean it up, at the expense of touching more churn. I'd be
fine with doing it that way.

OK, could be done later I suppose.

It could easily be layered on top, doesn't have to be part of the initial patch. Once the flag is there, callers can do what they need and add the flag.

inode_dio_begin() would be a good place to assert that i_mutex is held,
btw.

How would you check it? If the i_dio_count is bumped, then you'd not
need to hold i_mutex.

if (atomic_add_return() == 1)
assert()

That's not a 100% catch, but probably good enough.

I guess. It was just a thought. Having wandered around the code, I'm
not 100% confident that everyone is holding i_mutex - it's not all
obviously correct.

Personally I'd just prefer not having to dive deeper into this for the benefit of this patch.

otoh, the caller doesn't *have* to choose i_mutex for the external
exclusion, and perhaps some callers have used something else.



--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/