Re: [PATCH 4/6] Replace inode flush semaphore with a completion

From: Dave Chinner
Date: Wed Aug 13 2008 - 03:51:20 EST


On Tue, Aug 12, 2008 at 08:11:17PM -0700, Daniel Walker wrote:
> On Fri, 2008-06-27 at 18:44 +1000, Dave Chinner wrote:
> > Use the new completion flush code to implement the inode
> > flush lock. Removes one of the final users of semaphores
> > in the XFS code base.
> >
> > Version 2:
> > o make flock functions static inlines
> > o use new completion interfaces
>
> I was looking over this lock in XFS .. The iflock/ifunlock seem to be
> very much like mutexes in most of the calling locations.

Semaphores, not mutexes. The unlock most commonly happens in a
different context (i.e. I/O completion).

> Where the lock
> happens at the start, and the unlock happens when the function calls
> bottom out. It seems like a better way to go would be to change from,
>
> xfs_ilock(uqp, XFS_ILOCK_EXCL);
> xfs_iflock(uqp);
> error = xfs_iflush(uqp, XFS_IFLUSH_SYNC);
>
> Where xfs_iflush eventually does the unlock to,
>
> xfs_ilock(uqp, XFS_ILOCK_EXCL);
> xfs_iflock(uqp);
> error = xfs_iflush(uqp, XFS_IFLUSH_SYNC);
> xfs_ifunlock(uqp);

Firstly, sync flushes are rare. Async are common.

Right now we have the case where no matter what type of flush
is done, the caller does not have to worry about unlocking
the flush lock - it will be done as part of the flush. You're
suggestion makes that conditional based on whether we did a
sync flush or not.

So, what happenÑ when you call:

xfs_iflush(ip, XFS_IFLUSH_DELWRI_ELSE_SYNC);

i.e. xfs_iflush() may do an delayed flush or a sync flush depending
on the current state of the inode. The caller has no idea what type
of flush was done, so will have no idea whether to unlock or not.

> And remove the unlocking from inside xfs_iflush(). Then use a flag to
> indicate that the flush is in progress, and a
> completion/wait_for_completion when another thread needs to wait on the
> flush to complete if it's an async flush.

And if it's a delayed flush? If we just wait for completion, we'll
have to wait for a long time before the xfsbufd times out the buffer
and pushes it to disk. This is important - the log AIL push code
does try-locks on the flush lock to determine if the inode is in a
delayed write state or not, and does an async buffer push inÑtead
of xfs_iflush() to get it to disk immediately.

That is, there are three types of inode flushes (sync, async and
delwri) and the flush lock is used in different ways to determine
what action to take when writing back inodes. There's much more to
this 'flush lock' than just locking ;)

> That seems vastly more complex than your current patch, but I think it
> will be much cleaner ..

Doesn't seem that way to me...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/