Re: [PATCH v3 1/2] ext4: Pass in DIO_SKIP_DIO_COUNT flag if inode_dio_begin() called

From: Dave Chinner
Date: Tue Apr 19 2016 - 19:02:34 EST


On Mon, Apr 18, 2016 at 03:46:46PM -0400, Waiman Long wrote:
> On 04/15/2016 06:19 PM, Dave Chinner wrote:
> >On Fri, Apr 15, 2016 at 01:17:41PM -0400, Waiman Long wrote:
> >>On 04/15/2016 04:17 AM, Dave Chinner wrote:
> >>>On Thu, Apr 14, 2016 at 12:21:13PM -0400, Waiman Long wrote:
> >>>>What the patch does is to eliminate the innermost
> >>>>inode_dio_begin/end pair.
> >>>Yes, and with that change inode_dio_wait() no longer waits for
> >>>AIO+DIO writes on ext4, hence breaking truncate IO barrier
> >>>requirements of inode_dio_wait().
> >>>
> >>>Cheers,
> >>>
> >>>Dave.
> >>You are right and thank for pointing this out to me. I think I focus too
> >>much on the dax_do_io() internal and didn't realize that inode_dio_end() can
> >>be deferred in __blockdev_direct_IO(). I will update my patch to eliminate
> >>the extra inode_dio_begin/end pair only for dax_do_io().
> >Even there there is the risk that a future change will break ext4.
> >the ext4 code needs fixing first, then you can look at skipping the
> >DIO based counting everywhere.
> >
> >i.e. fix the root cause of the problem, don't hack around it or
> >throw band-aids over it.
>
> I agree that the ext4 code needs fixing w.r.t. the problem that you
> found. That will take more time and testing. In the mean time, I
> think it is OK to pick the low-hanging fruits that are handled by my
> patch.

IOWs, you're saying that you won't fix the problem, because all you
care about is scalability results. This is how we end up with code
that breaks randomly in future because if it doesn't get fixed now,
nobody will fix the underlying problem. So, fix it now, fix it
properly and you still get your scalability improvement without
leaving a landmine that will explode on someone else in future.

Fix it now, fix it properly.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx