Re: [PATCH 3/3] capabilities: add cap userns sysctl mask

From: Tycho Andersen
Date: Tue May 21 2024 - 10:30:01 EST


On Tue, May 21, 2024 at 01:12:57AM +0300, Jarkko Sakkinen wrote:
> On Tue May 21, 2024 at 12:13 AM EEST, Tycho Andersen wrote:
> > On Mon, May 20, 2024 at 12:25:27PM -0700, Jonathan Calmels wrote:
> > > On Mon, May 20, 2024 at 07:30:14AM GMT, Tycho Andersen wrote:
> > > > there is an ongoing effort (started at [0]) to constify the first arg
> > > > here, since you're not supposed to write to it. Your usage looks
> > > > correct to me, so I think all it needs is a literal "const" here.
> > >
> > > Will do, along with the suggestions from Jarkko
> > >
> > > > > + struct ctl_table t;
> > > > > + unsigned long mask_array[2];
> > > > > + kernel_cap_t new_mask, *mask;
> > > > > + int err;
> > > > > +
> > > > > + if (write && (!capable(CAP_SETPCAP) ||
> > > > > + !capable(CAP_SYS_ADMIN)))
> > > > > + return -EPERM;
> > > >
> > > > ...why CAP_SYS_ADMIN? You mention it in the changelog, but don't
> > > > explain why.
> > >
> > > No reason really, I was hoping we could decide what we want here.
> > > UMH uses CAP_SYS_MODULE, Serge mentioned adding a new cap maybe.
> >
> > I don't have a strong preference between SETPCAP and a new capability,
> > but I do think it should be just one. SYS_ADMIN is already god mode
> > enough, IMO.
>
> Sometimes I think would it make more sense to invent something
> completely new like capabilities but more modern and robust, instead of
> increasing complexity of a broken mechanism (especially thanks to
> CAP_MAC_ADMIN).
>
> I kind of liked the idea of privilege tokens both in Symbian and Maemo
> (have been involved professionally in both). Emphasis on the idea not
> necessarily on implementation.
>
> Not an LSM but like something that you could use in the place of POSIX
> caps. Probably quite tedious effort tho because you would need to pull
> the whole industry with the new thing...

And then we have LSM hooks, (ns_)capable(), __secure_computing() plus
a new set of hooks for this new thing sprinkled around. I guess
kernel developers wouldn't be excited about it, let alone the rest of
the industry :)

Thinking out loud: I wonder if fixing the seccomp TOCTOU against
pointers would help here. I guess you'd still have issues where your
policy engine resolves a path arg to open() and that inode changes
between the decision and the actual vfs access, you have just changed
the TOCTOU.

Or even scarier: what if you could change the return value at any
kprobe? :)

Tycho