Re: [PATCH] fuse: force sync attr when inode is invalidated

From: Wu Bo
Date: Tue Jun 28 2022 - 00:20:37 EST


Hi Vivek and Jiachen,

Sorry for late reply. It took time to build open email infrastructure in office.

> On Fri, Jun 24, 2022 at 10:27 AM Jiachen Zhang
> <zhangjiachen.jaycee@xxxxxxxxxxxxx> wrote:
>>
>> On Fri, Jun 24, 2022 at 3:26 AM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>>>
>>> On Tue, Jun 21, 2022 at 08:56:51PM +0800, wubo wrote:
>>>> From: Wu Bo <bo.wu@xxxxxxxx>
>>>>
>>>> Now the fuse driver only trust it's local inode size when
>>>> writeback_cache is enabled. Even the userspace server tell the driver
>>>> the inode cache is invalidated, the size attrabute will not update. And
>>>> will keep it's out-of-date size till the inode cache is dropped. This is
>>>> not reasonable.
>>>
>>> BTW, can you give more details about what's the use case. With
>>> writeback_cache, writes can be cached in fuse and not sent to
>>> file server immediately. And I think that's why fuse trusts
>>> local i_size.
>>>

Let me introduce this use case. It's widely used in Android11 now.
Android11 use fuse for APP to access the public files in order to support the
dynamic permission. To increase the performance at the same time, Android11
allow APPs to write big size files(e.g. jpg or mp4) directly to the lower fs
bypass fuse. So this issue come out:

APP-a APP-b
Write a jpg file through lowfs
Read the jpg file, size is 1M
Fish write
Update the size 2M in DB
Invalidate the fuse entry
Size keep to be 1M

[ Drop the inode cache manually ]

Size be update to 2M

If the inode cache keep in kernel, the size will not update forever because of
the writeback mode.

>>> With writeback_cache enabled, I don't think file should be modified
>>> externally (outside the fuse client).
>>>
>>> So what's that use case where file size cached in fuse is out of
>>> date. You probably should not use writeback_cache if you are
>>> modifying files outside the fuse client.

As we all know enable writeback mode can improve the performance. So this
feature can't be disabled in Android. And the key cause of this issue is that
writeback mode in fuse is not so reasonable now. And there are many methods to
fix this issue.

>>>
>>> Having said that I am not sure why FUSE_NOTIFY_INVAL_INODE was added to
>>> begin with. If files are not supposed to be modifed outside the fuse
>>> client, why are we dropping acls and invalidating attrs. If intent is
>>> just to drop page cache, then it should have been just that nothing
>>> else.
>>>

I proposal force attr invalidation. Because I think if user call the
FUSE_NOTIFY_INVAL_INODE as Android11 do it, they definately know what they are
doing. And the fuse driver should do the expected response.

>>> So up to some extent, FUSE_NOTIFY_INVAL_INODE is somewhat confusing. Would
>>> have been good if there was some documentation for it.
>>>
>>> Thanks
>>> Vivek
>>>
>>
>> Hi Wu and Vivek,
>>
>> Recently, we have had some discussions about the writeback_cache
>> revalidation on the mailing list [1][2]. Miklos gave his initial
>> patchset about writeback_cache v2, which supports c/mtime and size
>> updates [1]. However, those methods do not make use of reverse
>> messages, as virtio-fs does not support reverse notification yet. I'm
>> going to send out a new version of that patch based on the discussion
>> and with more considerations.
>
> The new patch:
> https://lore.kernel.org/linux-fsdevel/20220624055825.29183-1-zhangjiachen.jaycee@xxxxxxxxxxxxx/
>
> Thanks,
> Jiachen
>
>>
>> I also agree that, semantically, FUSE_NOTIFY_INVAL_INODE should
>> invalidate i_size as well. So I think this patch is a good supplement
>> for FUSE_NOTIFY_INVAL_INODE. But we need to be more careful as the
>> size can be updated from server to kernel, and from kernel to server.
>> I will leave some comments about such issues in the following code.
>>
>> For the use case, writeback_cache is superb over write-through mode in
>> write-intensive scenarios, but its consistency among multiple clients
>> is too bad (almost no consistency). I think it's good to give a little
>> more consistency to writeback_cache.
>>
>> [1] https://lore.kernel.org/linux-fsdevel/20220325132126.61949-1-zhangjiachen.jaycee@xxxxxxxxxxxxx/
>> [2] https://lore.kernel.org/linux-fsdevel/20220608104202.19461-1-zhangjiachen.jaycee@xxxxxxxxxxxxx/

This is a good solution to this issue. But I think we don't need another
writeback_cache mode, it's time to do the cache consistency under
writeback_cache mode.

>>
>>>>
>>>> Signed-off-by: Wu Bo <bo.wu@xxxxxxxx>
>>>> ---
>>>> fs/fuse/inode.c | 10 +++++++++-
>>>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>>>> index 8c0665c5dff8..a4e62c7f2b83 100644
>>>> --- a/fs/fuse/inode.c
>>>> +++ b/fs/fuse/inode.c
>>>> @@ -162,6 +162,11 @@ static ino_t fuse_squash_ino(u64 ino64)
>>>> return ino;
>>>> }
>>>>
>>>> +static bool fuse_force_sync(struct fuse_inode *fi)
>>>> +{
>>>> + return fi->i_time == 0;
>>>> +}
>>>> +
>>>> void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
>>>> u64 attr_valid, u32 cache_mask)
>>>> {
>>>> @@ -222,8 +227,10 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
>>>> u32 fuse_get_cache_mask(struct inode *inode)
>>>> {
>>>> struct fuse_conn *fc = get_fuse_conn(inode);
>>>> + struct fuse_inode *fi = get_fuse_inode(inode);
>>>> + bool is_force_sync = fuse_force_sync(fi);
>>>>
>>>> - if (!fc->writeback_cache || !S_ISREG(inode->i_mode))
>>>> + if (!fc->writeback_cache || !S_ISREG(inode->i_mode) || is_force_sync)
>>>> return 0;
>>>>
>>>> return STATX_MTIME | STATX_CTIME | STATX_SIZE;
>>>> @@ -437,6 +444,7 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid,
>>>> fi = get_fuse_inode(inode);
>>>> spin_lock(&fi->lock);
>>>> fi->attr_version = atomic64_inc_return(&fc->attr_version);
>>>> + fi->i_time = 0;
>>>> spin_unlock(&fi->lock);
>>
>> Seems fuse_reverse_inval_inode() only drops page cache from offset to
>> offset+len, should we only invalidate i_time on a full cache drop?
>> Otherwise, as the server size is stale, the users may see a file is
>> truncated.
>>
>> Also, what if a FUSE_GETATTR request gets the attr_version after
>> fuse_reverse_inval_inode() increases it, but tries to update i_size
>> after the invalidate_inode_pages2_range() in
>> fuse_reverse_inval_inode()? In this case, server_size can be updated
>> by invalidate_inode_pages2_range(), and FUSE_GETATTR might gets a
>> stale server_size. Meanwhile, as FUSE_GETATTR has got the newest
>> attr_version, the kernel_size will still be updated. This can cause
>> false truncation even for a single FUSE client. So we may need to do
>> more about the attr_version in writeback mode.
>>
>> Thanks,
>> Jiachen
>>
>>>>
>>>> fuse_invalidate_attr(inode);
>>>> --
>>>> 2.35.1
>>>>
>>>
>