Re: [PATCH RESEND] lsm: Fix the crash issue in xfrm_decode_session

From: Feng Yang

Date: Thu Mar 19 2026 - 23:25:53 EST


On Thu, 19 Mar 2026 14:22:06 -0400, Stephen Smalley wrote:
> On Thu, Mar 19, 2026 at 1:52 PM Casey Schaufler <casey@xxxxxxxxxxxxxxxx> wrote:
> >
> > On 3/18/2026 7:22 PM, Feng Yang wrote:
> > > On Wed, 18 Mar 2026 10:09:47 -0700, Casey Schaufler wrote:
> > >> On 3/17/2026 11:19 PM, Feng Yang wrote:
> > >>> From: Feng Yang <yangfeng@xxxxxxxxxx>
> > >>>
> > >>> After hooking the following BPF program:
> > >>> SEC("lsm/xfrm_decode_session")
> > >>> int BPF_PROG(lsm_hook_xfrm_decode_session, struct sk_buff *skb, u32 *secid, int ckall)
> > >>> {
> > >>> return 1; // Any non-zero value
> > >>> }
> > >>> Subsequent packet transmission triggers will cause a kernel panic:
> > >> LSM hooks that use or provide secids cannot be stacked. That is,
> > >> you can't provide a BPF LSM hook and an SELinux LSM hook and expect
> > >> correct behavior. Your proposed "fix" removes a legitimate check.
> > > I'm very sorry, I didn't quite understand what you meant.
> > >
> > > Maybe my commit message wasn't clear. I only used a BPF LSM hook without SELinux stacking enabled.
> >
> > Do i understand correctly that you do not have SELinux enabled?
> >
> > > Therefore, is it the expected behavior that simply using
> > > SEC("lsm/xfrm_decode_session")
> > > int BPF_PROG(lsm_hook_xfrm_decode_session, struct sk_buff *skb, u32 *secid, int ckall) {
> > > return -1;
> > > }
> > > would cause a kernel panic?
> >
> > Yes. As the infrastructure is written, any return other than 0 is erroneous.
> > You will need to query the SELinux developers about why they decided to
> > implement the xfrm handling in this way.
>
> Looks like this BUG_ON() has just been preserved since first being
> introduced by:
> https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=beb8d13bed80f8388f1a9a107d07ddd342e627e8

Yes, there is a comment in https://lore.kernel.org/all/Pine.LNX.4.64.0607122149070.573@d.namei/:

+static inline void security_xfrm_skb_secid(struct sk_buff *skb, u32 *secid)
{
- return security_ops->xfrm_decode_session(skb, fl);
+ BUG_ON(security_ops->xfrm_decode_session(skb, secid, 0));

BUG_ON looks wrong here, in that you don't know why the LSM returned an
error, and why should the box panic at this point at all?

But it seems no one has answered this question before.

> I think we can remove it at this point - it evidently doesn't ever
> trigger for SELinux or we would have seen it by now and it is
> definitely unsafe for BPF LSM.

Yes, the call in the SELinux module is as follows.

void security_skb_classify_flow(struct sk_buff *skb, struct flowi_common *flic)
{
int rc = call_int_hook(xfrm_decode_session, skb, &flic->flowic_secid,
0);

BUG_ON(rc);
}

int selinux_xfrm_decode_session(struct sk_buff *skb, u32 *sid, int ckall)
{
if (skb == NULL) {
*sid = SECSID_NULL;
return 0;
}
return selinux_xfrm_skb_sid_ingress(skb, sid, ckall);
}

static int selinux_xfrm_skb_sid_ingress(struct sk_buff *skb,
u32 *sid, int ckall)
{
u32 sid_session = SECSID_NULL;
struct sec_path *sp = skb_sec_path(skb);

if (sp) {
int i;

for (i = sp->len - 1; i >= 0; i--) {
struct xfrm_state *x = sp->xvec[i];
if (selinux_authorizable_xfrm(x)) {
struct xfrm_sec_ctx *ctx = x->security;

if (sid_session == SECSID_NULL) {
sid_session = ctx->ctx_sid;
if (!ckall)
goto out;
} else if (sid_session != ctx->ctx_sid) {
*sid = SECSID_NULL;
return -EINVAL;
}
}
}
}

out:
*sid = sid_session;
return 0;
}

Since ckall is 0, selinux_xfrm_skb_sid_ingress will only return 0.
Therefore, the selinux_xfrm_decode_session function will only return 0 when ckall is 0, and BUG_ON will never be triggered.

However, BPF can be exploited to cause a system crash.