Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()

From: Steve Lord
Date: Sun Feb 01 2004 - 11:43:12 EST


Christoph Hellwig wrote:
On Sat, Jan 31, 2004 at 09:59:06AM -0600, Steve Lord wrote:

The vn_revalidate call is called out of linvfs_setattr,
which is called with the i_sem held, it is also potentially called out
of linvfs_getattr, although since the i_size is always maintained
as it is changed, this call should not actually be updating the size.
Possibly changing the code in vn_revalidate to do this:

if (i_size_read(inode) < va.va_size))
i_size_write(inode, va.va_size);

Would be a good starting point, I suspect those calls from the nfs
revalidate call are not really going to change the inode size. My
guess is this will make your problem go away.

Probably some larger code restructure is needed so that revalidate
knows if the i_sem is held or not at this point.


I think the right fix is to update the Linux i_size always shortly after
di_size is updated. There's a lot of updates in the directory code while
should be handled by an i_size_write in the matching linvfs routines, and
there's a few more but we should be able to handle those without
vn_revalidate aswell.


Changing the validate_fields call to use

if (i_size_read(inode) != va.va_size)
i_size_write(inode, va.va_size);

should take care of directories, certainly better than the
direct assignment to i_size which is in there now....
This is called from the directory ops which modify the
contents of a directory and should already be under the
i_sem. The vn_revalidate code should use a != test
as well of course...



The O_DIRECT write case is the hard one. In XFS's internal view of
the world, the inode size is maintained via the XFS_ILOCK, but we
only hold that across metadata manipulation within the fs code,
not across I/O such as a call to generic_file_aio_write_nolock.
Right now the only way I see of dealing with that is to make
writes which we know will extent the file hold the i_sem for
the duration in the O_DIRECT case.


That's the more difficult case. Any reson why you'd hold i_sem
for the whole O_DIRECT I/O instead of just for updating i_sem?


We worked hard not to hold it, but that i_size_write in
the middle of the async_io code is tough to work around.

Note that the EOF zeroing code also calls i_size_write.


But that is called out of an extending setattr or a buffered write which
hold the semaphore. Hmm, actually O_DIRECT write can be in there as
well, but that is the same problem as the generic I/O path call.

Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/