Re: [RFC PATCH v1 00/11] Landlock: Namespace and capability control
From: Christian Brauner
Date: Wed Mar 25 2026 - 08:59:57 EST
On Thu, Mar 12, 2026 at 11:04:33AM +0100, Mickaël Salaün wrote:
> Namespaces are a fundamental building block for containers and
> application sandboxes, but user namespace creation significantly widens
> the kernel attack surface. CVE-2022-0185 (filesystem mount parsing),
> CVE-2022-25636 and CVE-2023-32233 (netfilter), and CVE-2022-0492 (cgroup
> v1 release_agent) all demonstrate vulnerabilities exploitable only
> through capabilities gained via user namespaces. Some distributions
> block user namespace creation entirely, but this removes a useful
> isolation primitive. Fine-grained control allows trusted programs to
> use namespaces while preventing unnecessary exposure for programs that
> do not need them.
>
> Existing mechanisms (user.max_*_namespaces sysctls, userns_create LSM
> hook, PR_SET_NO_NEW_PRIVS, and capset) each address part of this threat
> but none provides per-process, fine-grained control over both namespace
> types and capabilities. Container runtimes resort to seccomp-based
> clone/unshare filtering, but seccomp cannot dereference clone3's flag
> structure, forcing runtimes to block clone3 entirely.
>
> Landlock's composable layer model enables several patterns: a user
> session manager can restrict namespace types and capabilities broadly
> while allowing trusted programs to create the namespaces they need, and
> each deeper layer can further restrict the allowed set. Container
> runtimes can similarly deny namespace creation inside managed
> containers.
>
> This series adds two new permission categories to Landlock:
>
> - LANDLOCK_PERM_NAMESPACE_ENTER: Restricts which namespace types a
> sandboxed process can acquire: both creation (unshare/clone) and entry
> (setns). User namespace creation has no capability check in the
> kernel, so this is the only enforcement mechanism for that entry
> point.
>
> - LANDLOCK_PERM_CAPABILITY_USE: Restricts which Linux capabilities a
> sandboxed process can use, regardless of how they were obtained
> (including through user namespace creation).
>
> Both use new handled_perm and LANDLOCK_RULE_* constants following the
> existing allow-list model. The UAPI uses raw CAP_* and CLONE_NEW*
> values directly; unknown values are silently accepted for forward
> compatibility (the allow-list denies them by default). The Landlock ABI
> version is bumped from 8 to 9.
>
> The handled_perm infrastructure is designed to be reusable by future
> permission categories. The last patch documents the design rationale
> for the permission model and the criteria for choosing between
> handled_access_*, handled_perm, and scoped. A patch series to add
> socket creation control is under review [2]; it could benefit from the
> same permission model to achieve complete deny-by-default coverage of
> socket creation.
>
> This series builds on Christian Brauner's namespace LSM blob RFC [1],
> included as patch 1.
>
> Christian, could you please review patch 3? It adds a FOR_EACH_NS_TYPE
> X-macro to ns_common_types.h and derives CLONE_NS_ALL, replacing inline
> CLONE_NEW* flag enumerations in nsproxy.c and fork.c.
This all looks good to me, thanks! I'd really love to see this go in.