Re: [PATCH] 9p: use inode->i_lock to protect i_size_write()

From: Dominique Martinet
Date: Wed Jan 09 2019 - 23:18:32 EST


Hou Tao wrote on Thu, Jan 10, 2019:
> > Hmm, I'm not familiar with the read/write seqcount code for 32 bit but I
> > don't understand how locking here helps besides slowing things down (so
> > if the value is constantly updated, the read thread might have a chance
> > to be scheduled between two updates which was harder to do before ; and
> > thus "solving" your soft lockup)
>
> i_size_read() will call read_seqcount_begin() under 32-bit SMP environment,
> and it may loop in __read_seqcount_begin() infinitely because two or more
> invocations of write_seqcount_begin interleave and s->sequence becomes
> an odd number. It's noted in comments of i_size_write():
>
> /*
> Â* NOTE: unlike i_size_read(), i_size_write() does need locking around it
> Â* (normally i_mutex), otherwise on 32bit/SMP an update of i_size_seqcount
> Â* can be lost, resulting in subsequent i_size_read() calls spinning forever.
> Â*/

I see, I wasn't aware of how seqcount worked in details but it is a
simple seq++ with barrier so that does indeed need locking.
I had misunderstood the lockup reason as simply the value being updated
all the time.

> > Instead, a better fix would be to update v9fs_stat2inode to first read
> > the inode size, and only call i_size_write if it changed - I'd bet this
> > also fixes the problem and looks better than locking to me.
> > (Can also probably reuse stat->length instead of the following
> > i_size_read for i_blocks...)
>
> For read-only case, this fix will work. However if the inode size is changed
> constantly, there will be two or more callers of i_size_write() and the soft-lockup
> is still possible.

You are right. We can still do this as an optimization to not take the
lock in the read-only case, but otherwise it's safe to forget about this
comment.

> > On the other hand it might make sense to also lock the inode for
> > stat2inode because we're dealing with partially updated inodes at time,
> > but if we do this I'd rather put the locking in v9fs_stat2inode and not
> > outside of it to catch all the places where it's used; but the readers
> > don't lock so I'm not sure it makes much sense.
>
> Moving lock into v9fs_stat2inode() sounds reasonable. There are callers
> which don't need it (e.g. v9fs_qid_iget() uses it to fill attribute for a
> newly-created inode and v9fs_mount() uses it to fill attribute for root inode),
> so i will rename v9fs_stat2inode() to v9fs_stat2inode_nolock(), and wrap
> v9fs_stat2inode() upon v9fs_stat2inode_nolock().

If it's a newly created inode there is no contention and the spinlock
has virtually no cost ; and v9fs_mount's root inode is the same.
Let's keep this simple and always lock around i_size_write.

> > There's also a window during which the inode's nlink is dropped down to
> > 1 then set again appropriately if the extension is present; that's
> > rather ugly and we probably should only reset it to 1 if the attribute
> > wasn't set before... That can be another patch and/or I'll do it
> > eventually if you don't.
>
> I can not follow that. Do you mean inode->i_nlink may be updated concurrently
> by v9fs_stat2inode() and v9fs_remove() and that will lead to corruption of i_nlink ?

Not corruption, but the value can be incoherent for a short while.
If another thread looks at nlink continuously it can see the value being
reset to 1 for a short time when there really is no reason to.

I think this isn't as important as the other issues you've raised, so
don't worry about this point unless you want to.

> I also note a race about updating of v9inode->cache_validity. It seems
> that it is possible the clear of V9FS_INO_INVALID_ATTR in
> v9fs_remove() may lost if there are invocations of v9fs_vfs_getattr()
> in the same time. We may need to ensure V9FS_INO_INVALID_ATTR is
> enabled before clearing it atomically in v9fs_vfs_getattr() and i will
> send another patch for it.

There are many places setting this flag but that appears correct this
isn't safe.

It's more complicated than what you're saying though, e.g. even if the
flag was already set at the start of v9fs_vfs_getattr, if another thread
"sets it again" while it is running we need not to unset it -- so
basically this needs to be a 3-value:
- valid
- invalid
- invalid-being-refreshed

v9fs_invalidate_inode_attr would set it to invalid regardless of the
previous state.
start of v9fs_vfs_getattr would set it to invalid-being-refreshed (iff
it was invalid? always?)
end of v9fs_vfs_getattr/v9fs_stat2inode would set it back to valid iff
it is invalid-being-refreshed.


You cannot clear the flag at the start of v9fs_vfs_getattr() because
another getattr in parallel needs to know it cannot trust the cached
value; currently that parallel getattr call would issue another stat
request to the server but if we implement this we could wait until the
flag becomes valid (I'm not sure if it's appropriate to poll the value
even if we yield, so this might need a mutex or other lock that can
sleep though...); well, for now it's ok to behave like we currently do
and also proceed on to refresh the attribute in that other thread, we
can think about handling it better incrementally

Thanks,
--
Dominique