On Tue, May 28, 2024 at 05:13:04PM +0800, Gao Xiang wrote:
Hi Christian,
On 2024/5/28 16:43, Christian Brauner wrote:
On Tue, May 28, 2024 at 12:02:46PM +0800, Gao Xiang wrote:
On 2024/5/28 11:08, Jingbo Xu wrote:
On 5/28/24 10:45 AM, Jingbo Xu wrote:
On 5/27/24 11:16 PM, Miklos Szeredi wrote:
On Fri, 24 May 2024 at 08:40, Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
3. I don't know if a kernel based recovery mechanism is welcome on the
community side. Any comment is welcome. Thanks!
I'd prefer something external to fuse.
Okay, understood.
Maybe a kernel based fdstore (lifetime connected to that of the
container) would a useful service more generally?
Yeah I indeed had considered this, but I'm afraid VFS guys would be
concerned about why we do this on kernel side rather than in user space.
Just from my own perspective, even if it's in FUSE, the concern is
almost the same.
I wonder if on-demand cachefiles can keep fds too in the future
(thus e.g. daemonless feature could even be implemented entirely
with kernel fdstore) but it still has the same concern or it's
a source of duplication.
Thanks,
Gao Xiang
I'm not sure what the VFS guys think about this and if the kernel side
shall care about this.
Fwiw, I'm not convinced and I think that's a big can of worms security
wise and semantics wise. I have discussed whether a kernel-side fdstore
would be something that systemd would use if available multiple times
and they wouldn't use it because it provides them with no benefits over
having it in userspace.
As far as I know, currently there are approximately two ways to do
failover mechanisms in kernel.
The first model much like a fuse-like model: in this mode, we should
keep and pass fd to maintain the active state. And currently,
userspace should be responsible for the permission/security issues
when doing something like passing fds.
The second model is like one device-one instance model, for example
ublk (If I understand correctly): each active instance (/dev/ublkbX)
has their own unique control device (/dev/ublkcX). Users could
assign/change DAC/MAC for each control device. And failover
recovery just needs to reopen the control device with proper
permission and do recovery.
So just my own thought, kernel-side fdstore pseudo filesystem may
provide a DAC/MAC mechanism for the first model. That is a much
cleaner way than doing some similar thing independently in each
subsystem which may need DAC/MAC-like mechanism. But that is
just my own thought.
The failover mechanism for /dev/ublkcX could easily be implemented using
the fdstore. The fact that they rolled their own thing is orthogonal to
this imho. Implementing retrieval policies like this in the kernel is
slowly advancing into /proc/$pid/fd/ levels of complexity. That's all
better handled with appropriate policies in userspace. And cachefilesd
can similarly just stash their fds in the fdstore.