Re: [RFC PATCH 2/3] add statmnt(2) syscall

From: Christian Brauner
Date: Mon Sep 18 2023 - 12:20:37 EST


> Atomicity of getting a snapshot of the current mount tree with all of
> its attributes was never guaranteed, although reading
> /proc/self/mountinfo into a sufficiently large buffer would work that
> way. However, I don't see why mount trees would require stronger
> guarantees than dentry trees (for which we have basically none).

So atomicity was never put forward as a requirement. In that
session/recording I explicitly state that we won't guarantee atomicity.
And systemd agreed with this. So I think we're all on the same page.

> Even more type clean interface:
>
> struct statmnt *statmnt(u64 mnt_id, u64 mask, void *buf, size_t
> bufsize, unsigned int flags);
>
> Kernel would return a fully initialized struct with the numeric as
> well as string fields filled. That part is trivial for userspace to
> deal with.

I really would prefer a properly typed struct and that's what everyone
was happy with in the session as well. So I would not like to change the
main parameters.

> > Plus, the format for how to return arbitrary filesystem mount options
> > warrants a separate discussion imho as that's not really vfs level
> > information.
>
> Okay. Let's take fs options out of this.

Thanks.

>
> That leaves:
>
> - fs type and optionally subtype

So since subtype is FUSE specific it might be better to move this to
filesystem specific options imho.

> - root of mount within fs
> - mountpoint path
>
> The type and subtype are naturally limited to sane sizes, those are
> not an issue.

What's the limit for fstype actually? I don't think there is one.
There's one by chance but not by design afaict?

Maybe crazy idea:
That magic number thing that we do in include/uapi/linux/magic.h
is there a good reason for this or why don't we just add a proper,
simple enum:

enum {
FS_TYPE_ADFS 1
FS_TYPE_AFFS 2
FS_TYPE_AFS 3
FS_TYPE_AUTOFS 4
FS_TYPE_EXT2 5
FS_TYPE_EXT3 6
FS_TYPE_EXT4 7
.
.
.
FS_TYPE_MAX
}

that we start returning from statmount(). We can still return both the
old and the new fstype? It always felt a bit odd that fs developers to
just select a magic number.

>
> For paths the evolution of the relevant system/library calls was:
>
> char *getwd(char buf[PATH_MAX]);
> char *getcwd(char *buf, size_t size);
> char *get_current_dir_name(void);
>
> It started out using a fixed size buffer, then a variable sized
> buffer, then an automatically allocated buffer by the library, hiding
> the need to resize on overflow.
>
> The latest style is suitable for the statmnt() call as well, if we
> worry about pleasantness of the API.

So, can we then do the following struct:

struct statmnt {
__u64 mask; /* What results were written [uncond] */
__u32 sb_dev_major; /* Device ID */
__u32 sb_dev_minor;
__u64 sb_magic; /* ..._SUPER_MAGIC */
__u32 sb_flags; /* MS_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} */
__u32 __spare1;
__u64 mnt_id; /* Unique ID of mount */
__u64 mnt_parent_id; /* Unique ID of parent (for root == mnt_id) */
__u32 mnt_id_old; /* Reused IDs used in proc/.../mountinfo */
__u32 mnt_parent_id_old;
__u64 mnt_attr; /* MOUNT_ATTR_... */
__u64 mnt_propagation; /* MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} */
__u64 mnt_peer_group; /* ID of shared peer group */
__u64 mnt_master; /* Mount receives propagation from this ID */
__u64 propagate_from; /* Propagation from in current namespace */
__aligned_u64 mountpoint;
__u32 mountpoint_len;
__aligned_u64 mountroot;
__u32 mountroot_len;
__u64 __spare[20];
};

Userspace knows already how to deal with that because of bpf and other
structs (e.g., both systemd and LXC have ptr_to_u64() helpers and so
on). Libmount and glibc can hide this away internally as well.