Re: [GIT PULL] selinux/selinux-pr-20250323

From: Paul Moore
Date: Fri Mar 28 2025 - 11:07:05 EST


On Fri, Mar 28, 2025 at 9:23 AM Stephen Smalley
<stephen.smalley.work@xxxxxxxxx> wrote:
> On Thu, Mar 27, 2025 at 3:40 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, 27 Mar 2025 at 12:16, Stephen Smalley
> > <stephen.smalley.work@xxxxxxxxx> wrote:
> > >
> > > Where could/would we cache that information so that it was accessible
> > > directly by the VFS layer?
> >
> > So the VFS layer already does this for various other things. For this
> > case, the natural thing to do would be to add another IOP_xyzzy flag
> > in inode->i_opflags.
> >
> > That's how we already say things like "this inode has no
> > filesystem-specific i_op->permission function" (IOP_FASTPERM), so that
> > we don't even have to follow the "inode->i_op->permission" pointer
> > chain to see a NULL pointer.
> >
> > Yes, the VFS layer is *heavily* optimized like that. It literally does
> > that IOP_FASTPERM to avoid chasing two pointers - not even the call,
> > just the "don't even bother to follow pointers to see if it's NULL".
> > See do_inode_permission().
> >
> > And we have 16 bits in that inode->i_opflags, and currently only use 7
> > of those bits. Adding one bit for a IOP_NO_SECURITY_LOOKUP kind of
> > logic (feel free to rename that - just throwing a random name out as a
> > suggestion) would be a complete no-brainer.
> >
> > NOTE! The rule for the i_opflags accesses is that *reading* them is
> > done with no locking at all, but changing them takes the inode
> > spinlock (and we should technically probably use WRITE_ONCE() and
> > READ_ONCE(), but we don't).
> >
> > And notice that the "no locking at all for reading" means that if you
> > *change* the bit - even though that involves locking - there may be
> > concurrent lookups in process that won't see the change, and would go
> > on as if the lookup still does not need any security layer call. No
> > serialization to readers at all (although you could wait for an RCU
> > period after changing if you really need to, and only use the bit in
> > the RCU lookup).
> >
> > That should be perfectly fine - I really don't think serialization is
> > even needed. If somebody is changing the policy rules, any file
> > lookups *concurrent* to that change might not see the new rules, but
> > that's the same as if it happened before the change.
> >
> > I just wanted to point out that the serialization is unbalanced: the
> > spinlock for changing the flag is literally just to make sure that two
> > bits being changed at the same time don't stomp on each other (because
> > it's a 16-bit non-atomic field, and we didn't want to use a "unsigned
> > long" and atomic bitops because the cache layout of the inode is also
> > a big issue).
> >
> > And you can take the IOP_FASTPERM thing as an example of how to do
> > this: it is left clear initially, and what happens is that during the
> > permission lookup, if it *isn't* set, we'll follow those
> > inode->i_io->permission pointers, and notice that we should set it:
> >
> > if (unlikely(!(inode->i_opflags & IOP_FASTPERM))) {
> > if (likely(inode->i_op->permission))
> > return inode->i_op->permission(idmap, inode, mask);
> >
> > /* This gets set once for the inode lifetime */
> > spin_lock(&inode->i_lock);
> > inode->i_opflags |= IOP_FASTPERM;
> > spin_unlock(&inode->i_lock);
> > }
> >
> > and I think the security layer could take a very similar approach: not
> > setting that IOP_NO_SECURITY_LOOKUP initially, but *when* a
> > security_inode_permission() call is made with just MAY_NOT_BLOCK |
> > MAY_LOOKUP, and the security layer notices that "this inode has no
> > reason to care", it could set the bit so that *next* time around the
> > VFS layer won't bother to call into security_inode_permission()
> > unnecessarily.
> >
> > Does that clarify?
>
> Yes, thank you. I think it would be easy enough to make that change to
> selinux_inode_permission() and to clear that inode flag on file
> relabels (e.g. in selinux_inode_post_setxattr() and
> inode_invalidate_secctx()). Not as sure about handling policy reloads
> / boolean changes at runtime without also caching the policy sequence
> number in the inode too so that can be compared. Further, I'm unclear
> on how to implement this in a manner that works with the LSM stacking
> support, since some other module might disagree about whether we can
> skip these lookups. Normally I'd just add an extra boolean or flag
> argument to the underlying ->inode_permission() hook so each module
> can indicate its decision and
> security/security.c:security_inode_permission() could compose the
> result, but I guess that would require switching
> security_inode_permission() from using call_int_hook() to manually
> doing the lsm_for_each_hook() loop itself which may have an adverse
> impact on those calls in general.

I'm not sure the VFS flag approach is going to end up being practical
due to various constraints, but I do have some other ideas that should
allow us to drop the bulk of the SELinux AVC calls while doing path
walks that I'm going to try today/soon. I'll obviously report back if
it works out.

--
paul-moore.com