Re: [for-linus][PATCH 1/3] eventfs: Have the inodes all for files and directories all be the same

From: Mathieu Desnoyers
Date: Mon Jan 22 2024 - 12:49:08 EST


On 2024-01-22 10:06, Steven Rostedt wrote:
On Mon, 22 Jan 2024 11:38:52 +0100
Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:

[...]

On Wed, Jan 17, 2024 at 3:37 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
From: "Steven Rostedt (Google)" <rostedt@xxxxxxxxxxx>

The dentries and inodes are created in the readdir for the sole purpose of
getting a consistent inode number. Linus stated that is unnecessary, and
that all inodes can have the same inode number. For a virtual file system
they are pretty meaningless.

Instead use a single unique inode number for all files and one for all
directories.

Link: https://lore.kernel.org/all/20240116133753.2808d45e@xxxxxxxxxxxxxxxxxx/

Yeah, Linus wanted me to try this first and see if there's any regressions.
Well, I guess you just answered that.

This confuses "find".
Running "find /sys/" now prints lots of error messages to stderr:

find: File system loop detected;
‘/sys/kernel/debug/tracing/events/initcall/initcall_finish’ is part of
the same file system loop as
‘/sys/kernel/debug/tracing/events/initcall’.

So at a minimum, the directories need to have unique inode numbers.

[...]

I'm using the eventfs_inode pointer to create a unique value for the inode.
But it's being salted, hashed and then truncated. As it is very easy to
read inodes (although by default, only root has access to read these
inodes), the inode numbers themselves shouldn't be able to leak kernel
addresses via the results of these inode numbers, would it?

Why use an improvised hashing function (re-purposed from
scripts/kconfig/symbol.c to a use-case which is exposed through a
userspace ABI prone to kernel address leaks) rather than simply
reserving values by setting bits in a bitmap ?

How many inodes do we realistically expect to have there ?

On my 6.1.0 kernel:

find /sys/kernel/tracing | wc -l
15598

(mainly due to TRACE_EVENT ABI files)

Hashing risks:

- Exposing kernel addresses if the hashing algorithm is broken,
- Collisions if users are unlucky (which could trigger those
'find' errors).

Those 15598 inode values fit within a single page (bitmap of
1922 bytes).

So I would recommend simply adding a bitmap per tracefs filesystem
instance to keep track of inode number allocation.

Creation/removal of files/directories in tracefs should not be
a fast-path anyway, so who cares about the speed of a find first
bit within a single page ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com