Re: [PATCH RFC] fhandle: expose u64 mount id to name_to_handle_at(2)

From: Aleksa Sarai
Date: Tue May 21 2024 - 06:43:24 EST


On 2024-05-21, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> On Tue, May 21, 2024 at 1:28 AM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote:
> >
> > On 2024-05-20, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > On Mon, 2024-05-20 at 17:35 -0400, Aleksa Sarai wrote:
> > > > Now that we have stabilised the unique 64-bit mount ID interface in
> > > > statx, we can now provide a race-free way for name_to_handle_at(2) to
> > > > provide a file handle and corresponding mount without needing to worry
> > > > about racing with /proc/mountinfo parsing.
>
> Both statx() and name_to_handle_at() support AT_EMPTY_PATH, so
> there is a race-free way to get a file handle and unique mount id
> for statmount().

Doing it that way would require doing an open and statx for every path
you want to get a filehandle for, tripling the number of syscalls you
need to do. This is related to the syscall overhead issue Lennart talked
about last week at LSF (though for his usecase we would need to add a
hashed filehandle in statx).

> Why do you mean /proc/mountinfo parsing?

The man page for name_to_handle_at(2) talks about needing to parse
/proc/mountinfo as well as the possible races you can hit.

> > > > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit
> > > > that doesn't make sense for name_to_handle_at(2).
>
> Christian is probably regretting merging AT_HANDLE_FID now ;-)
>
> Seriously, I would rearrange the AT_* flags namespace this way to
> explicitly declare the overloaded per-syscall AT_* flags and possibly
> prepare for the upcoming setxattrat(2) syscall [1].

I'm not sure that unifying the flag namespace is a good idea -- while it
would be nicer, burning a flag bit for an extension will be more
expensive because we would only have 32 bits for every possible
extension we ever plan to have.

FWIW, I think that statx should've had their own flag namespace like
move_mount and renameat2.

> [1] https://lore.kernel.org/linux-fsdevel/20240426162042.191916-1-cgoettsche@xxxxxxxxxxxxx/
>
> The reason I would avoid overloading the AT_STATX_* flags is that
> they describe a generic behavior that could potentially be relevant to
> other syscalls in the future, e.g.:
> renameat2(..., AT_RENAME_TEMPFILE | AT_FORCE_SYNC);

Yeah, you might be right that the sync-related flags aren't the right
ones to overload here.

> But then again, I don't understand why you need to extend name_to_handle_at()
> at all for your purpose...
>
> Thanks,
> Amir.
>
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> [...]
> +
> +#define AT_PRIVATE_FLAGS 0x2ff /* Per-syscall flags mask. */
> +
> +/* Common flags for *at() syscalls */
> #define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
> -#define AT_EACCESS 0x200 /* Test access permitted for
> - effective IDs, not real IDs. */
> -#define AT_REMOVEDIR 0x200 /* Remove directory instead of
> - unlinking file. */
> #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
> #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal
> automount traversal */
> #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
>
> +/* Flags for statx(2) */
> #define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation
> required from statx() */
> #define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */
> #define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to
> be sync'd with the server */
> [...]
>
> #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
>
> -/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */
> -#define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to
> +/* Flags for name_to_handle_at(2) */
> +#define AT_HANDLE_FID 0x200 /* file handle is needed to
> compare object identity and may not
> be usable to open_by_handle_at(2) */
> +/* Flags for faccessat(2) */
> +#define AT_EACCESS 0x200 /* Test access permitted for
> + effective IDs, not real IDs. */
> +/* Flags for unlinkat(2) */
> +#define AT_REMOVEDIR 0x200 /* Remove directory instead of
> + unlinking file. */
> +
> +/* Flags for renameat2(2) (should match legacy RENAME_* flags) */
> +#define AT_RENAME_NOREPLACE 0x001 /* Don't overwrite target */
> +#define AT_RENAME_EXCHANGE 0x002 /* Exchange source and dest */
> +#define AT_RENAME_WHITEOUT 0x004 /* Whiteout source */
> +#define AT_RENAME_TEMPFILE 0x008 /* Source file is O_TMPFILE */
> +
> +/* Flags for setxattrat(2) (should match legacy XATTR_* flags) */
> +#define AT_XATTR_CREATE 0x001 /* set value, fail if
> attr already exists */
> +#define AT_XATTR_REPLACE 0x002 /* set value, fail if attr
> does not exist */
> +

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Attachment: signature.asc
Description: PGP signature