Re: [RFC PATCH 0/9] fuse: API for Checkpoint/Restore

From: Bernd Schubert
Date: Fri Mar 03 2023 - 14:43:10 EST

Next message: Rafael J. Wysocki: "Re: [PATCH v5 00/18] Self-encapsulate the thermal zone device structure"
Previous message: Darren Hart: "Re: Error reports at boot time in Ampere Altra machines since c733ebb7c"
Next in thread: Aleksandr Mikhalitsyn: "Re: [RFC PATCH 0/9] fuse: API for Checkpoint/Restore"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2/20/23 20:37, Alexander Mikhalitsyn wrote:

Hello everyone,

It would be great to hear your comments regarding this proof-of-concept Checkpoint/Restore API for FUSE.

Support of FUSE C/R is a challenging task for CRIU [1]. Last year I've given a brief talk on LPC 2022
about how we handle files C/R in CRIU and which blockers we have for FUSE filesystems. [2]

The main problem for CRIU is that we have to restore mount namespaces and memory mappings before the process tree.
It means that when CRIU is performing mount of fuse filesystem it can't use the original FUSE daemon from the
restorable process tree, but instead use a "fake daemon".

This leads to many other technical problems:
* "fake" daemon has to reply to FUSE_INIT request from the kernel and initialize fuse connection somehow.
This setup can be not consistent with the original daemon (protocol version, daemon capabilities/settings
like no_open, no_flush, readahead, and so on).
* each fuse request has a unique ID. It could confuse userspace if this unique ID sequence was reset.

We can workaround some issues and implement fragile and limited support of FUSE in CRIU but it doesn't make any sense, IMHO.
Btw, I've enumerated only CRIU restore-stage problems there. The dump stage is another story...

My proposal is not only about CRIU. The same interface can be useful for FUSE mounts recovery after daemon crashes.
LXC project uses LXCFS [3] as a procfs/cgroupfs/sysfs emulation layer for containers. We are using a scheme when
one LXCFS daemon handles all the work for all the containers and we use bindmounts to overmount particular
files/directories in procfs/cgroupfs/sysfs. If this single daemon crashes for some reason we are in trouble,
because we have to restart all the containers (fuse bindmounts become invalid after the crash).
The solution is fairly easy:
allow somehow to reinitialize the existing fuse connection and replace the daemon on the fly
This case is a little bit simpler than CRIU cause we don't need to care about the previously opened files
and other stuff, we are only interested in mounts.

I like your patches, small and easy to read :)
So this basically fails all existing open files - our (future) needs go beyond that. I wonder if we can extend it later and re-init the new daemon with something like "fuse_queue_recall" - basically the opposite of fuse_queue_forget. Not sure if fuse can access the vfs dentry cache to know for which files that would need to be done - if not, it would need to do its own book-keeping.

Thanks,
Bernd

Next message: Rafael J. Wysocki: "Re: [PATCH v5 00/18] Self-encapsulate the thermal zone device structure"
Previous message: Darren Hart: "Re: Error reports at boot time in Ampere Altra machines since c733ebb7c"
Next in thread: Aleksandr Mikhalitsyn: "Re: [RFC PATCH 0/9] fuse: API for Checkpoint/Restore"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]