Re: [GIT PULL] selinux/selinux-pr-20250323

From: Stephen Smalley
Date: Fri Mar 28 2025 - 09:24:05 EST


On Thu, Mar 27, 2025 at 3:40 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, 27 Mar 2025 at 12:16, Stephen Smalley
> <stephen.smalley.work@xxxxxxxxx> wrote:
> >
> > Where could/would we cache that information so that it was accessible
> > directly by the VFS layer?
>
> So the VFS layer already does this for various other things. For this
> case, the natural thing to do would be to add another IOP_xyzzy flag
> in inode->i_opflags.
>
> That's how we already say things like "this inode has no
> filesystem-specific i_op->permission function" (IOP_FASTPERM), so that
> we don't even have to follow the "inode->i_op->permission" pointer
> chain to see a NULL pointer.
>
> Yes, the VFS layer is *heavily* optimized like that. It literally does
> that IOP_FASTPERM to avoid chasing two pointers - not even the call,
> just the "don't even bother to follow pointers to see if it's NULL".
> See do_inode_permission().
>
> And we have 16 bits in that inode->i_opflags, and currently only use 7
> of those bits. Adding one bit for a IOP_NO_SECURITY_LOOKUP kind of
> logic (feel free to rename that - just throwing a random name out as a
> suggestion) would be a complete no-brainer.
>
> NOTE! The rule for the i_opflags accesses is that *reading* them is
> done with no locking at all, but changing them takes the inode
> spinlock (and we should technically probably use WRITE_ONCE() and
> READ_ONCE(), but we don't).
>
> And notice that the "no locking at all for reading" means that if you
> *change* the bit - even though that involves locking - there may be
> concurrent lookups in process that won't see the change, and would go
> on as if the lookup still does not need any security layer call. No
> serialization to readers at all (although you could wait for an RCU
> period after changing if you really need to, and only use the bit in
> the RCU lookup).
>
> That should be perfectly fine - I really don't think serialization is
> even needed. If somebody is changing the policy rules, any file
> lookups *concurrent* to that change might not see the new rules, but
> that's the same as if it happened before the change.
>
> I just wanted to point out that the serialization is unbalanced: the
> spinlock for changing the flag is literally just to make sure that two
> bits being changed at the same time don't stomp on each other (because
> it's a 16-bit non-atomic field, and we didn't want to use a "unsigned
> long" and atomic bitops because the cache layout of the inode is also
> a big issue).
>
> And you can take the IOP_FASTPERM thing as an example of how to do
> this: it is left clear initially, and what happens is that during the
> permission lookup, if it *isn't* set, we'll follow those
> inode->i_io->permission pointers, and notice that we should set it:
>
> if (unlikely(!(inode->i_opflags & IOP_FASTPERM))) {
> if (likely(inode->i_op->permission))
> return inode->i_op->permission(idmap, inode, mask);
>
> /* This gets set once for the inode lifetime */
> spin_lock(&inode->i_lock);
> inode->i_opflags |= IOP_FASTPERM;
> spin_unlock(&inode->i_lock);
> }
>
> and I think the security layer could take a very similar approach: not
> setting that IOP_NO_SECURITY_LOOKUP initially, but *when* a
> security_inode_permission() call is made with just MAY_NOT_BLOCK |
> MAY_LOOKUP, and the security layer notices that "this inode has no
> reason to care", it could set the bit so that *next* time around the
> VFS layer won't bother to call into security_inode_permission()
> unnecessarily.
>
> Does that clarify?

Yes, thank you. I think it would be easy enough to make that change to
selinux_inode_permission() and to clear that inode flag on file
relabels (e.g. in selinux_inode_post_setxattr() and
inode_invalidate_secctx()). Not as sure about handling policy reloads
/ boolean changes at runtime without also caching the policy sequence
number in the inode too so that can be compared. Further, I'm unclear
on how to implement this in a manner that works with the LSM stacking
support, since some other module might disagree about whether we can
skip these lookups. Normally I'd just add an extra boolean or flag
argument to the underlying ->inode_permission() hook so each module
can indicate its decision and
security/security.c:security_inode_permission() could compose the
result, but I guess that would require switching
security_inode_permission() from using call_int_hook() to manually
doing the lsm_for_each_hook() loop itself which may have an adverse
impact on those calls in general.