[PATCH net 0/2] ipv4: harden against ihl < 5 IP_HDRINCL packets

From: Michael Bommarito

Date: Tue May 12 2026 - 16:52:13 EST


This series fixes a size_t underflow in net/ipv4/ah4.c:ah_output()
reachable when a raw IP_HDRINCL socket sends a packet with ihl < 5
through an xfrm AH policy. Originally triaged on security@xxxxxxxxxx;
moving to netdev at Herbert's suggestion so nftables / netfilter
maintainers can weigh in on a related question (see "Open question"
below). Herbert also asked for the malformed packet to be rejected
upstream of AH rather than guarded at the AH consumer; that is
patch 1/2. v1's AH-side guard is kept here as 2/2 defense-in-depth.

Bug
---

In net/ipv4/ah4.c, ah_output_done() and ah_output() copy the IPv4
options area with

if (top_iph->ihl != 5) {
memcpy(dst, src, top_iph->ihl * 4 - sizeof(struct iphdr));
}

The "!= 5" guard correctly excludes the no-options case but does
NOT exclude ihl < 5. For ihl in [0, 4], top_iph->ihl * 4 is less
than sizeof(struct iphdr) (20); the subtraction is computed as int
and becomes negative, then is implicitly converted to size_t at the
memcpy() call. The resulting length is close to SIZE_MAX and
memcpy walks off the slab allocation backing the skb's network
header.

The malformed packet arrives via raw_send_hdrinc() in net/ipv4/raw.c.
raw_send_hdrinc() validates "iphlen > length" but does not reject
"iphlen < sizeof(struct iphdr)". An IP_HDRINCL caller with
CAP_NET_RAW (acquirable in an unprivileged user+net namespace on a
distro kernel with CONFIG_USER_NS=y) can therefore craft an ihl < 5
packet; if a matching xfrm AH policy is installed on the outgoing
route, ah_output() runs on the crafted packet and panics the host
kernel.

The guard has been in place since 1da177e4c3f4 ("Linux-2.6.12-rc2",
2005). No prior fix on lore (3-year window) and no CVE on the file.

Reproduction
------------

x86 + KASAN (QEMU KVM, net-next 7.1.0-rc2):

BUG: KASAN: out-of-bounds in ah_output+0x696/0x19e0
Read of size 18446744073709551596 at addr ffff88800bae9824 \
by task trigger_ah4_ihl/97
Call Trace:
__asan_memcpy+0x23/0x60
ah_output+0x696/0x19e0
xfrm_output_resume+0xdc8/0x6280
xfrm4_output+0xfe/0x4c0
raw_sendmsg+0x2531/0x26f0
__sys_sendto+0x32b/0x390
__x64_sys_sendto+0xdf/0x1f0
do_syscall_64+0xf3/0x6a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The buggy address belongs to the object at ffff88800bae9800
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 36 bytes inside of
1024-byte region [ffff88800bae9800, ffff88800bae9c00)

The read size 0xFFFFFFFFFFFFFFEC (SIZE_MAX - 19) is the
underflowed result of (top_iph->ihl * 4 - sizeof(struct iphdr))
for ihl = 0. Trigger: veth pair (loopback bypasses
xfrm_output), xfrm AH transport-mode policy, IP_HDRINCL
sendto() of a 128-byte packet with iph->ihl in [0, 4].

A container-only variant (CAP_NET_ADMIN container, no
--privileged, no host networking) panics the host kernel on a
stock distro kernel with CONFIG_INET_AH=m + module autoload.
Repro harness + container Dockerfile + console logs available
privately on request; not attached to this public posting.

Patches
-------

1/2 ipv4: raw: reject IP_HDRINCL packets with ihl < 5

Upstream-of-AH fix. An IPv4 header with ihl < 5 is malformed
by definition (RFC 791) and must not be allowed to continue
along the in-stack output path. This is the primary fix.

2/2 ipv4: ah: harden ah_output options-copy guard against ihl < 5

Defense-in-depth at the three memcpy sites in ah_output() and
ah_output_done(). Changes "if (top_iph->ihl != 5)" to
"if (top_iph->ihl > 5)" so a future path delivering an ihl < 5
packet cannot re-introduce the OOB access. With patch 1/2 in
place an IP_HDRINCL-crafted ihl < 5 packet should no longer
reach ah_output; this patch closes the OOB primitive
specifically at the AH consumer.

Open question for netfilter / netdev
------------------------------------

After patch 1/2 lands, a caller with CAP_NET_ADMIN can still
deliver an ihl < 5 packet into the post-LOCAL_OUT in-stack path by
attaching an nftables payload-set rule on NF_INET_LOCAL_OUT (or an
NFQUEUE reinject on the same hook) that rewrites byte 0 of the
IPv4 header after the raw_send_hdrinc / __ip_local_out validation
has run. Construction:

nft add table ip mangle
nft add chain ip mangle output { type filter hook output \
priority -150 \; }
nft add rule ip mangle output ip daddr <victim> \
@nh,0,8 set 0x40

I reproduced this separately with nftables payload-set delivering an
ihl = 0 packet to xfrm4_output() and onward. Patch 2/2 covers the
AH consumer; other consumers that read iph->ihl after the LOCAL_OUT
hook may be similarly exposed and I have not enumerated them.

Direction question rather than a fix proposal: does basic iphdr
re-sanitization after a header-mangling hook belong in the netfilter
machinery, in each in-stack consumer, or both?

Michael Bommarito (2):
ipv4: raw: reject IP_HDRINCL packets with ihl < 5
ipv4: ah: harden ah_output options-copy guard against ihl < 5

net/ipv4/ah4.c | 6 +++---
net/ipv4/raw.c | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)


base-commit: 73d587ae684d176fac9db94173f77d78a794ea4f
--
2.53.0