Re: file metadata via fs API

From: Steven Whitehouse
Date: Wed Aug 12 2020 - 15:34:28 EST


On 12/08/2020 19:18, Linus Torvalds wrote:
On Tue, Aug 11, 2020 at 5:05 PM David Howells <dhowells@xxxxxxxxxx> wrote:
Well, the start of it was my proposal of an fsinfo() system call.
Ugh. Ok, it's that thing.

This all seems *WAY* over-designed - both your fsinfo and Miklos' version.

What's wrong with fstatfs()? All the extra magic metadata seems to not
really be anything people really care about.

What people are actually asking for seems to be some unique mount ID,
and we have 16 bytes of spare information in 'struct statfs64'.

All the other fancy fsinfo stuff seems to be "just because", and like
complete overdesign.

Let's not add system calls just because we can.


The point of this is to give us the ability to monitor mounts from userspace. The original inspiration was rtnetlink, in that we need a "dump" operation to give us a snapshot of the current mount state, plus then a stream of events which allow us to keep that state updated. The tricky question is what happens in case of overflow of the events queue, and just like netlink, that needs a resync of the current state to fix that, since we can't block mounts, of course.

The fsinfo syscall was designed to be the "dump" operation in this system. David's other patch set provides the stream of events. So the two are designed to work together. We had the discussion on using netlink, of whatever form a while back, and there are a number of reasons why that doesn't work (namespace being one).

I think fstatfs might also suffer from the issue of not being easy to call on things for which you have no path (e.g. over-mounted mounts) Plus we need to know which paths to query, which is why we need to enumerate the mounts in the first place - how would we get the fds for each mount? It might give you some sb info, but it doesn't tell you the options that the sb is mounted with, and it doesn't tell you where it is mounted either.

The overall aim is to solve some issues relating to scaling to large numbers of mount in systemd and autofs, and also to provide a generically useful interface that other tools may use to monitor mounts in due course too. Currently parsing /proc/mounts is the only option, and that tends to be slow and is certainly not atomic. Extension to other sb related messages is a future goal, quota being one possible application for the notifications.

If there is a simpler way to get to that goal, then thats all to the good, and we should definitely consider it,