Re: [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
From: Bernd Schubert
Date: Thu May 05 2022 - 11:13:17 EST
On 5/5/22 14:54, Vivek Goyal wrote:
On Thu, May 05, 2022 at 11:42:51AM +0530, Dharmendra Hans wrote:
Here are the numbers I took last time. These were taken on tmpfs to
actually see the effect of reduced calls. On local file systems it
might not be that much visible. But we have observed that on systems
where we have thousands of clients hammering the metadata servers, it
helps a lot (We did not take numbers yet as we are required to change
a lot of our client code but would be doing it later on).
Note that for a change in performance number due to the new version of
these patches, we have just refactored the code and functionality has
remained the same since then.
here is the link to the performance numbers
https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@xxxxxxxxx/
There is a lot going in that table. Trying to understand it.
- Why care about No-Flush. I mean that's independent of these changes,
right? I am assuming this means that upon file close do not send
a flush to fuse server. Not sure how bringing No-Flush into the
mix is helpful here.
It is a basically removing another call from kernel to user space. The
calls there are, the lower is the resulting percentage for atomic-open.
- What is "Patched Libfuse"? I am assuming that these are changes
needed in libfuse to support atomic create + atomic open. Similarly
assuming "Patched FuseK" means patched kernel with your changes.
Yes, I did that to ensure there is no regression with the patches, when
the other side is not patched.
If this is correct, I would probably only be interested in
looking at "Patched Libfuse + Patched FuseK" numbers to figure out
what's the effect of your changes w.r.t vanilla kernel + libfuse.
Am I understanding it right?
Yes.
- I am wondering why do we measure "Sequential" and "Random" patterns.
These optimizations are primarily for file creation + file opening
and I/O pattern should not matter.
bonnie++ does this automatically and it just convenient to take the
bonnie++ csv value and to paste them into a table.
In our HPC world mdtest is more common, but it has MPI as requirement -
make it harder to run. Reproducing the values with bonnie++ should be
rather easy for you.
Only issue with bonnie++ is that bonnie++ by default does not run
multi-threaded and the old 3rd party perl scripts I had to let it run
with multiple processes and to sum up the values don't work anymore with
recent perl versions. I need to find some time to fix that.
- Also wondering why performance of Read/s improves. Assuming once
file has been opened, I think your optimizations get out of the
way (no create, no open) and we are just going through data path of
reading file data and no lookups happening. If that's the case, why
do Read/s numbers show an improvement.
That is now bonnie++ works. It creates the files, closes them (which
causes the flush) and then re-opens for stat and read - atomic open
comes into the picture here. Also read() is totally skipped when the
files are empty - which is why one should use something like 1B files.
If you have another metadata benchmark - please let us know.
- Why do we measure "Patched Libfuse". It shows performance regression
of 4-5% in table 0B, Sequential workoad. That sounds bad. So without
any optimization kicking in, it has a performance cost.
Yes, I'm not sure yet. There is not so much code that has changed on the
libfuse side.
However the table needs to be redone with fixed libfuse - limiting the
number of threads caused a permanent libfuse thread creation and destruction
https://github.com/libfuse/libfuse/pull/652
The numbers in table are also with paththrough_ll, which has its own
issue due to linear inode search. paththrough_hp uses a C++ map and
avoids that. I noticed too late when I started to investigate why there
are regressions....
Also the table made me to investigate/profile all the fuse operations,
which resulted in my waitq question. Please see that thread for more
details
https://lore.kernel.org/lkml/9326bb76-680f-05f6-6f78-df6170afaa2c@xxxxxxxxxxx/T/
Regarding atomic-open/create with avoiding lookup/revalidate, our
primary goal is to reduce network calls. A file system that handles it
locally only reduces the number of fuse kernel/user space crossing. A
network file system that fully supports it needs to do the atomic open
(or in old terms lookup-intent-open) on the server side of the network
and needs to transfer attributes together with the open result.
Lustre does this, although I cannot easily point you to the right code.
It all started almost two decades ago:
https://groups.google.com/g/lucky.linux.fsdevel/c/iYNFIIrkJ1s
BeeGFS does this as well
https://git.beegfs.io/pub/v7/-/blob/master/client_module/source/filesystem/FhgfsOpsInode.c
See for examaple FhgfsOps_atomicOpen() and FhgfsOps_createIntent()
(FhGFS is the old name when I was still involved in the project.)
From my head I'm not sure if NFS does it over the wire, maybe v4.
Thanks,
Bernd