On Tue, Jun 07, 2022 at 11:42:16PM +0200, Bernd Schubert wrote:
On 6/7/22 23:25, Vivek Goyal wrote:
On Sun, Jun 05, 2022 at 12:52:00PM +0530, Dharmendra Singh wrote:
From: Dharmendra Singh <dsingh@xxxxxxx>
In general, as of now, in FUSE, direct writes on the same file are
serialized over inode lock i.e we hold inode lock for the full duration
of the write request. I could not found in fuse code a comment which
clearly explains why this exclusive lock is taken for direct writes.
Following might be the reasons for acquiring exclusive lock but not
limited to
1) Our guess is some USER space fuse implementations might be relying
on this lock for seralization.
Hi Dharmendra,
I will just try to be devil's advocate. So if this is server side
limitation, then it is possible that fuse client's isize data in
cache is stale. For example, filesystem is shared between two
clients.
- File size is 4G as seen by client A.
- Client B truncates the file to 2G.
- Two processes in client A, try to do parallel direct writes and will
be able to proceed and server will get two parallel writes both
extending file size.
I can see that this can happen with virtiofs with cache=auto policy.
IOW, if this is a fuse server side limitation, then how do you ensure
that fuse kernel's i_size definition is not stale.
Hi Vivek,
I'm sorry, to be sure, can you explain where exactly a client is located for
you? For us these are multiple daemons linked to libufse - which you seem to
call 'server' Typically these clients are on different machines. And servers
are for us on the other side of the network - like an NFS server.
Hi Bernd,
Agreed, terminology is little confusing. I am calling "fuse kernel" as
client and fuse daemon (user space) as server. This server in turn might
be the client to another network filesystem and real files might be
served by that server on network.
So for simple virtiofs case, There can be two fuse daemons (virtiofsd
instances) sharing same directory (either on local filesystem or on
a network filesystem).
So now while I'm not sure what you mean with 'client', I'm wondering about
two generic questions
a) I need to double check, but we were under the assumption the code in
question is a direct-io code path. I assume cache=auto would use the page
cache and should not be effected?
By default cache=auto use page cache but if application initiates a
direct I/O, it should use direct I/O path.
b) How would the current lock help for distributed clients? Or multiple fuse
daemons (what you seem to call server) per local machine?
I thought that current lock is trying to protect fuse kernel side and
assumed fuse server (daemon linked to libfuse) can handle multiple
parallel writes. Atleast that's how I thought about the things. I might
be wrong. I am not sure.
For a single vfs mount point served by fuse, truncate should take the
exclusive lock and parallel writes the shared lock - I don't see a problem
here either.
Agreed that this does not seem like a problem from fuse kernel side. I was
just questioning that where parallel direct writes become a problem. And
answer I heard was that it probably is fuse server (daemon linked with
libfuse) which is expecting the locking. And if that's the case, this
patch is not fool proof. It is possible that file got truncated from
a different client (from a different fuse daemon linked with libfuse).
So say A is first fuse daemon and B is another fuse daemon. Both are
clients to some network file system as NFS.
- Fuse kernel for A, sees file size as 4G.
- fuse daemon B truncates the file to size 2G.
- Fuse kernel for A, has stale cache, and can send two parallel writes
say at 3G and 3.5G offset.
- Fuser daemon A might not like it.(Assuming this is fuse daemon/user
space side limitation).
I hope I am able to explain my concern. I am not saying that this patch
is not good. All I am saying that fuse daemon (user space) can not rely
on that it will never get two parallel direct writes which can be beyond
the file size. If fuse kernel cache is stale, it can happen. Just trying
to set the expectations right.