Re: [PATCH 06/21] VFS: Introduce a superblock configuration context [ver #3]

From: Miklos Szeredi
Date: Thu May 18 2017 - 04:10:19 EST

On Wed, May 17, 2017 at 1:31 PM, David Howells <dhowells@xxxxxxxxxx> wrote:
> Miklos Szeredi <mszeredi@xxxxxxxxxx> wrote:
>> > (b) is internal-only at the moment, used by NFS submounts as triggered by
>> > automounts. There isn't currently any way to supply mount options to this.
>> And all blockdev based fs.
> I see what you're getting at. In which case there are more cases:
> (a) new mount, new sb struct with no source (eg. procfs, sysfs, tmpfs)
> (b) new mount, new sb struct, params loaded from filesystem data (eg. bdev)
> (c) new mount, new sb struct, params derived from parent (eg. NFS automount)
> (d) new mount, shared extant sb struct
> (e) remount
> In the case of (d) where we're attempting to make another mount for an extant
> super_block struct and we need to check the consistency of the parameters.

Yes. Current behavior seems to just ignore given options (except
MS_RDONLY) in that case, so we need to keep that possibility.

Also I think it would be good to allow selecting when superblock is created:

- non-exclusive create: if exists return it, if not create it
- exclusive create: only create if non-existent
- non-create: only return if exists

>> > Ah - but some of these options have to be set *inside* sget() or before the
>> > superblock becomes live, even the ones that can be changed in-flight.
>> That would be the "???" category. Any concrete examples?
> NFS is a good example. You need parameters that indicate the server to talk
> to and specify I/O parameters before you even get the superblock as you have
> to talk to the server first. I think this is particularly true of NFSv2/3
> where you need to talk to mountd.
> This would also be true of AFS. There you have to access the network to look
> up the volume ID before you can call sget() as the volume ID is part of the
> index key to the set of super_block structs.
> Further, some of these values (I/O parameters in NFS's case, for example) form
> part of the super_block struct index key, so you have to set those inside
> sget()'s set callback.

So what I propose is:

1) call ->parse_option()

would get indication what we are trying to do (find and/or
create and/or reconfig)

this step is optional, the the filesystem type could possibly be
enough for the following steps

2) call ->get_tree()

pass sc containing parsed options and flags controlling the
creation of the superblock (create/exclusive)

this step is optional, not called if we are given an sb to work
with (i.e. only reconfig)

3) call ->reconfig()

pass sc containing parsed options

this step is optional, we might be instructed just to find or
create the sb

>> >> Also I think silently ignoring options is not always the right answer.
>> >
>> > Example?
>> mount /dev/sda -oacl /mnt
>> mount /dev/sda -onoacl /mnt2
> So you'd like to give an error or a warning if ACLs are not supported, either
> by the filesystem or the kernel as a whole?

What I was getting at is that the second mount will ignore the "noacl"
option. It's not something we apparently care much about (but will
definitely want to keep as back-compat thing for the mount(2)
interface). But for the new interface I think we need something less
crazy. One solution would be the exclusive create, which doesn't have
this problem. Maybe that's enough; not sure if we need anything more

>> You are thinking on the wrong level. Of course mount(2) needs to
>> handle MS_NOSUID et al. But it's doing it now, and it isn't parsing
>> "nosuid", just translating MS_NOSUID to MNT_NOSUID.
> Ummm... That's done by the parser in this case, so effectively it is.

Where exactly? You are not touching do_mount(), which is where the
MS_*** -> MNT_*** translation is done.

>> For the fsopen() case you won't need to parse "nosuid" because that's a flag
>> for fsmount().
> Whilst this is true, that means that the parser has to operate differently in
> the mount(2) and fsopen(2) cases - which I was trying to avoid.

I don't get it. We never passed MNT_* options as strings to the
kernel. That was parsed by mount(8) and translated to MS_* flags.
So how would mount(2) and fsopen(2) need to operate differently
regarding parsing MNT_* options, when we want neither to do it?

>> The only thing fsmount() should take from the sc is the root_dentry.
>> It should be equivalent to what currently is a bind mount, except it
>> should be able to fully configure the new mount.
> It needs to take the device name as well. I wonder if it would be possible to
> store the device name on the superblock and then leave a path-in-mount in the

Ah, mnt_devname. The device name as just a special type of option and
as such should be stored in the superblock.

> vfsmount struct to fabricate a <source>:/<path> later. Though this would
> change the behaviour if someone did:
> mknod /dev/foo b 8 1
> mknod /dev/bar b 8 1
> mount /dev/foo /mnt/foo
> mount /dev/bar /mnt/bar
> as /proc/mounts would now show /dev/foo for /mnt/bar.

I'd very much hope this doesn't introduce regressions, but if it did,
then we'd have to go back to using mnt_devname...

> Also, I guess the subtype should be wangled in the superblock-getting code
> (vfs_get_tree() as of patch 21) rather than in do_new_mount_sc(). If I do
> that, then it may be that do_new_mount_sc() only needs the root dentry pointer
> and not the sb_config pointer (except for error string passing).

Would be nice.

And hopefully error string passing can be made generic and be moved
out of this set.