Re: [PATCH bpf v4 2/3] bpf: Add validation for bpf_set_retval argument
From: Xu Kuohai
Date: Thu Jun 04 2026 - 22:52:07 EST
On 6/5/2026 3:19 AM, Yonghong Song wrote:
On 6/4/26 6:04 AM, Xu Kuohai wrote:
From: Xu Kuohai <xukuohai@xxxxxxxxxx>
The bpf_set_retval() helper is used by cgroup BPF programs to set the
return value of the target hook. The argument type for this helper is
ARG_ANYTHING. This allows setting a positive value, which no cgroup
hook expects and can cause issues, such as:
- BPF_LSM_CGROUP: a positive value from bpf_lsm_socket_create bypasses
the err < 0 check in __sock_create(), leaving the socket object
unallocated. The positive return value is then propagated to the
syscall entry __sys_socket(), which also bypasses the IS_ERR() guard
and ultimately causes a NULL pointer dereference.
- BPF_CGROUP_DEVICE: a positive value can be returned through cgroup
device bpf prog -> devcgroup_check_permission() -> bdev_permission()
-> bdev_file_open_by_dev(), where ERR_PTR(positive) produces a pointer
that IS_ERR() does not catch, leading to a wild pointer dereference.
- BPF_CGROUP_SOCK: a positive value can be returned through cgroup sock
bpf prog -> __cgroup_bpf_run_filter_sk() -> inet_create() ->
__sock_create(), where inet_create() frees the newly allocated sk
via sk_common_release() and sets sock->sk = NULL on the non-zero
return, but __sock_create() only checks err < 0 for cleanup, so a
positive retval bypasses cleanup and returns a socket with NULL sk
to userspace, triggering a NULL pointer dereference on subsequent
socket operations.
- BPF_CGROUP_SYSCTL: a positive value can be returned through the cgroup
bpf prog -> __cgroup_bpf_run_filter_sysctl() -> proc_sys_call_handler(),
where a non-zero return bypasses the normal sysctl proc_handler and is
returned directly to userspace as return value of read() or write()
syscall.
FYI, the following patch:
https://lore.kernel.org/bpf/20260603105317.944304-4-dawei.feng@xxxxxxxxxx/
will change return value for BPF_CGROUP_SYSCTL from 1 to 0.
Hmm, it is a complementary fix. It updates the BPF_CGROUP_SYSCTL to use 0
as the success return value, while my patch restricts the bpf prog to only
return 0 for success or -errno for failure.
[...]