Re: [PATCH v2] fs: rename S_KERNEL_FILE

From: Amir Goldstein
Date: Fri Jan 28 2022 - 06:09:43 EST


On Fri, Jan 28, 2022 at 12:12 PM David Howells <dhowells@xxxxxxxxxx> wrote:
>
> Christoph Hellwig <hch@xxxxxx> wrote:
>
> > S_KERNEL_FILE is grossly misnamed. We have plenty of files hold
>
> "held".
>
> > open by the kernel kernel using filp_open.
>
> You said "kernel" twice.
>
> And so what? Cachefiles holds files open by filp_open - but it can only do so
> temporarily otherwise EMFILE/ENFILE and OOMs start to become a serious problem
> because it could end up holding thousands of files open (one or two of the
> xfstests cause this to happen).
>
> Further, holding the file open *doesn't* prevent cachefilesd from trying to
> cull the file to make space whilst it's "in use".
>
> Yet further, I'm not holding the directories that form the cache layout open
> with filp_open at all - I'm not reading them, so that would just be a waste of
> resources - but I really don't want cachefilesd culling them because it sees
> they're empty but cachefiles is holding them ready.
>
> > This flag OTOH signals the inode as being a special snowflake that
> > cachefiles holds onto that can't be unlinked because of ..., erm, pixie
> > dust.
>
> Really? I presume you read the explanation I gave of the races that are a
> consequence of splitting the driver between the kernel and userspace? I could
> avoid them - or at least mitigate them - by keeping my own list of all the
> inodes in use by cachefiles so that cachefilesd can query it. I did, in fact,
> use to have such a list, but the core kernel already has such lists and the
> facilities to translate pathnames into internal objects, so my stuff ought to
> be redundant - all I need is one inode flag.
>
> Further, that inode flag can be shared with anyone else who wants to do
> something similar. It's just an "I'm using this" lock. There's no particular
> reason to limit its use to cachefiles. In fact, it is better if it is then
> shared so that in the unlikely event that two drivers try to use the same
> file, an error will occur.
>

Good idea, but then the helpers to set the flag should not be internal
to cachefiles and the locking semantics should be clear.
FYI, overlayfs already takes an "exclusive lock" on upper/work dir
among all ovl instances.

How do you feel about hoisting ovl_inuse_* helpers to fs.h
and rename s/I_OVL_INUSE/I_EXCL_INUSE?

Whether deny rmdir should have its own flag or not I don't know,
but from ovl POV I *think* it should not be a problem to deny rmdir
for the ovl upper/work dirs as long as ovl is mounted(?).

>From our experience, adding the exclusive lock caused regressions
in setups with container runtimes that had mount leaks bugs.
I am hoping that all those mount leaks bugs were fixed, but one never
knows what sort of regressions deny rmdir of upper/work may cause.

Another problem with generic deny of rmdir is that users getting
EBUSY have no way to figure out the reason.
At least for a specific subsystem (i.e. cachefiles) users should be able
to check if the denied dir is in the subsystem's inventory(?)

Thanks,
Amir.