Re: [PATCH 2/4] cgroup: add bpf hook for attach

From: Michal Koutný

Date: Mon Mar 09 2026 - 12:52:15 EST

Hello.

On Fri, Feb 27, 2026 at 02:44:27PM +0100, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> So calling this misconfiguration is like taking a shortcut
> by simply pointing to a different destination. But fine, let's say you
> insist on this not being valid.

I understand this in analogy with filesystem organization -- there's the
package manager that ensures files are put in right places,
non-conflicting and trackable. Subtrees may be delegated (e.g.
/usr/local). If root (or whoever has perms for it), decides to
manipulate the files, its up to them what they end up with.

> The implementation atop of a single superblock like cgroupfs is
> questionable.

(This is an interesting topic which I'd like to discuss some other
time not to diverge here.)

> But in general the point is that there's no mechanism to enforce cgroup
> tree policy currently in a sufficiently flexible manner.
>
> > root detaching/disabling these hook progs anyway?)
>
> I cannot help but read this as you asking me "What if you're too dumb to
> write a security policy that isn't self-defeating?" :)

I was just trying to express that there's only one level of root (user).
(Cautionary example are "containers" that executed as (host) root.
Lockdown neglected.)

> bpf has security hooks for itself including security_bpf(). First thing
> that comes to mind is to have security.bpf.* or trusted.* xattrs on
> selected processes like PID 1 that mark it as eligible for modifying BPF
> state or BPF LSM programs supervising link/prog detach, update etc and
> then designating only PID 1 as handing out those magical xattrs. Can be
> as fine-grained as needed and that tells everyone else to go away and do
> something else.

(These are too many new concepts for me, I must skip it now. I may catch
up after more study.)

> systemd will gain the ability to implement policy to control cgroup tree
> modifications in as much details as it needs without having the kernel
> in need to be aware of it. This can take various forms by marking only
> select processes as being eligible for managing cgroup migrations or
> even just locking down specific cgroups.

This is how I understand the goal could be expressed in current terms:

a) allowlisting processes that can do migrations
# common ancestor of all + access to each dst
chown -R :grA $root_cgroup/cgroup.procs
chmod -R g+w $root_cgroup/cgroup.procs

# static:
usermod -G grA user_of_pid
(re)start pid
# or in spawner:
fork
setgroups([grA])
exec

b) rules that are specific to cgroup (subtree)
# applying same like above but to a $lower_group

$ setfacl -R -m g:grB:w $lower_group/cgroup.procs
setfacl: cgroup.procs: Operation not supported
# here I notice why current impl isn't sufficient

Also, if I understand this correctly you semm to move from the semantics
where users (UIDs) are subjects to a different one where it's bound to
processes (PIDs).

> The policy needs to be flexible so it can be live-updated, switched into
> auditing mode, and losened, tightened on-demand as needed.

OK.
(I'd add that policy should be also easily debuggable/troubleshootable.)

> Ok, let's start with cgroup_can_fork(). The sched ext hook isn't a
> generic permission check. It's called way after
> cgroup_attach_permissions() and is a per cgroup controller check that is
> only called for some cgroup controllers. So effectively useless to pull
> up (Notice also, how some controllers like cpuset call additional
> security hooks already.).

There could be one BPF predicate (on the cgroup level) and potentially
pass per-controller data, so that function could employ (or not) those.
It's true that semantics would be a bit different because of implicit
migrations happening with controller en-/disablement.

What I don't like about the multiple hooks is that there'd be several
places to check when one is trying to figure out why a migration failed.

> On top of that this looks like a category mistake imho. The callbacks
> are a dac-like permission mechanism whereas the hooks is actual mac
> permission checking. I'm not sure lumping this together with
> per-cgroup-controller migration preparations will be very clean. I think
> it will end up looking rather confusing. But that's best left to you
> cgroup maintainers, I think.

This paragraph hinted me at (yet) another mechanism in the kernel (and
you also mentioned it with cpuset) -- the LSM hooks. Namely, if this was
security_cgroup_attach() hook, the logic could be expressed with other
existing modules, IIUC, one of them is BPF. Would that fulfil the
behaviors you're missing?

(I'm proposing this as potentially less confusing/known "evil" approach
to the scenarios considered above.)

Thanks,
Michal