Re: [PATCH 00/13] VFS: Filesystem information [ver #19]

From: Miklos Szeredi
Date: Wed Apr 01 2020 - 04:18:31 EST


On Wed, Apr 1, 2020 at 7:22 AM Ian Kent <raven@xxxxxxxxxx> wrote:
>
> On Wed, 2020-03-18 at 17:05 +0100, Miklos Szeredi wrote:
> > On Wed, Mar 18, 2020 at 4:08 PM David Howells <dhowells@xxxxxxxxxx>
> > wrote:
> >
> > > ============================
> > > WHY NOT USE PROCFS OR SYSFS?
> > > ============================
> > >
> > > Why is it better to go with a new system call rather than adding
> > > more magic
> > > stuff to /proc or /sysfs for each superblock object and each mount
> > > object?
> > >
> > > (1) It can be targetted. It makes it easy to query directly by
> > > path.
> > > procfs and sysfs cannot do this easily.
> > >
> > > (2) It's more efficient as we can return specific binary data
> > > rather than
> > > making huge text dumps. Granted, sysfs and procfs could
> > > present the
> > > same data, though as lots of little files which have to be
> > > individually opened, read, closed and parsed.
> >
> > Asked this a number of times, but you haven't answered yet: what
> > application would require such a high efficiency?
>
> Umm ... systemd and udisks2 and about 4 others.
>
> A problem I've had with autofs for years is using autofs direct mount
> maps of any appreciable size cause several key user space applications
> to consume all available CPU while autofs is starting or stopping which
> takes a fair while with a very large mount table. I saw a couple of
> applications affected purely because of the large mount table but not
> as badly as starting or stopping autofs.
>
> Maps of 5,000 to 10,000 map entries can almost be handled, not uncommon
> for heavy autofs users in spite of the problem, but much larger than
> that and you've got a serious problem.
>
> There are problems with expiration as well but that's more an autofs
> problem that I need to fix.
>
> To be clear it's not autofs that needs the improvement (I need to
> deal with this in autofs itself) it's the affect that these large
> mount tables have on the rest of the user space and that's quite
> significant.


According to dhowell's measurements processing 100k mounts would take
about a few seconds of system time (that's the time spent by the
kernel to retrieve the data, obviously the userspace processing would
add to that, but that's independent of the kernel patchset). I think
that sort of time spent by the kernel is entirely reasonable and is
probably not worth heavy optimization, since userspace is probably
going to spend as much, if not more time with each mount entry.

> I can't even think about resolving my autofs problem until this
> problem is resolved and handling very large numbers of mounts
> as efficiently as possible must be part of that solution for me
> and I think for the OS overall too.

The key to that is allowing userspace to retrieve individual mount
entries instead of having to parse the complete mount table on every
change.

Thanks,
Miklos