Re: [fuse-devel] [PATCH] fuse: Add support for fuse stacked I/O

From: Hans Beckerus
Date: Fri Jan 15 2016 - 16:56:21 EST


On 2016-01-15 8:29, Nikhilesh Reddy wrote:
On Fri 15 Jan 2016 09:51:50 AM PST, Nikolaus Rath wrote:
On Jan 15 2016, Antonio SJ Musumeci <trapexit@xxxxxxxxxx> wrote:
The idea is that you want to be able to reason about open, create, etc. but
don't care about the data transfer.

I have N filesystems I wish to unionize. When I create a new file I want to
pick the drive with the most free space (or some other algo). creat is
called, succeeds, and now the application issuing this starts writing. The
FUSE fs doesn't care about the writes. It just wanted to pick the drive
this file should have been created on. Anything I'd do with the FD after
that I'm happy to short circuit. I don't need to be asked what to do when
fstat'ing this FD or anything which in FUSE hands over the 'fh'. It's just
a file descriptor for me and I'd simply be calling the same function.

Ideally I think one would want to be able to select which functions to
short circuit and maybe even have it so that a short circuited function
could propagate back through FUSE on error. But the read and write short
circuiting is probably the biggest win given the overhead.

I think you should avoid using the term "stacked" completely (which
would also make Christoph happy). There have been several discussions in
the past about adding a "fd delegation" function to FUSE. Generally, the
idea is that the FUSE userspace code tells the FUSE kernel module to
internally "delegate" to writes and reads for a given file (or even a
range in that file) to a different file descriptor provided by
userspace.

I think that function would be useful, and not just for union file
systems. There are many FUSE file systems that end up writing the data
into some other file on the disk without doing any transformations on
the data itself. Especially with the range feature, they would all
benefit from the ability to delegate reads and writes.
I agree with Nikolaus here. I do believe there might be use-cases that could benefit from this.
I have a typical example were a FUSE fs wish to handle reads but really does not care about the writes other than
it should transparently write to the underlying fs. Simply getting a move of a file from the underlying fs to the
FUSE mount point if located on e.g. the same physical partition would result in a more or less instant operation, right?
But this also requires that the operations are selectable. A user should be able to choose which operation to bypass.
I understand though that this will need adaptations to libfuse as well.
Another question here is if an inotify write-type watch on the FUSE mount point will be affected by this or not?

However, Miklos has said in the past that the performance gain from this
is very small. You can get almost as good a result by splicing from one
fd to the other in userspace. In that case this function could actually
be implemented completely in libfuse.


Do you have any benchmark results that compare a splice-in-userspace
approach with your patch?


Best,
-Nikolaus


Hi

@Linus
Thanks for taking the time to reply to my email. It means a lot.

FUSE allows users to implement extensions to filesystems ..such as enforcing policy or permissions without having to modify the kernel or maintain the policy in the kernel.

One such example is what was quoted by Antonio above ..
Another example is a fuse based filesystem that tries to enforce additional permissions on a FAT based mount point.

>From what i could google there are many FUSE based filesystems out there that do things during the open call but simply pass through the read/and write I/O calls to the local "lower" filesystem where they actually store the data.

>From what i understand ...unionfs or overlayfs and similar filesystems are primarily used to support a merged or unified view of directories and do not offer mechanisms to add policy or other checks /extensions to the I/O operations without modifying the kernel..

The main motivation is to make FUSE performance better in such usecases without loosing out on the ease of implementing and extending in the userspace.



@Nikolaus
Our local benchmarks on embedded devices (where power and cpu usage is critical) show that splice doesnt help as much .. when running multiple cpu's results in increased power usage

The below results are on a specific device model.

Where IOPS is number of 4K based read or writes that could be performed each second.

regular spliced Stacked I/O
sequencial write (MiBPS) 56.55633333 100.34445 141.7096667
sequencial read (MiBPS) 49.644 60.43434 122.367

random write (IOPS) 2554.333333 4053.4545 8572
random read (IOPS) 977.3333333 1223.34 1432.666667

The above tests were performed using a file size of 1GB

Using stacked I/O showed the best performance (almost the same as the native EXT4 filesystem that is storing the real file)

Also we measured that there is a 5% saving of Power and the CPU timeslices used. ( Splice did not improve this at all compared to default fuse)

Random I/O i.e seeking to random parts of a file and reading ( usecases such as elf and *.so loading from fuse based filesystems also improved


Similarly when using MMAPED I/O ( in an extended patch to this one.. still in progress) showed a significant improvement about a 400% improvement over default fuse.

Also we can called it FUSE_DELEGATED_IO if that helps :).
I chose to call is stacked i/o since we are technically stacking the fuse read/writes on the ext4/fat or other filesystems.

Please let me know if you have any questions.

@everyone
Thanks so much for your comments and the interest.
Also many of you have shown support for the patch in private emails.
I would be grateful if you could voice the same support on the public thread so that everyone knows that there is interest in this patch.