Re: [PATCH bpf v4 2/3] bpf: Add validation for bpf_set_retval argument

From: Xu Kuohai

Date: Thu Jun 04 2026 - 22:52:07 EST


On 6/5/2026 3:19 AM, Yonghong Song wrote:


On 6/4/26 6:04 AM, Xu Kuohai wrote:
From: Xu Kuohai <xukuohai@xxxxxxxxxx>

The bpf_set_retval() helper is used by cgroup BPF programs to set the
return value of the target hook. The argument type for this helper is
ARG_ANYTHING. This allows setting a positive value, which no cgroup
hook expects and can cause issues, such as:

- BPF_LSM_CGROUP: a positive value from bpf_lsm_socket_create bypasses
   the err < 0 check in __sock_create(), leaving the socket object
   unallocated. The positive return value is then propagated to the
   syscall entry __sys_socket(), which also bypasses the IS_ERR() guard
   and ultimately causes a NULL pointer dereference.

- BPF_CGROUP_DEVICE: a positive value can be returned through cgroup
   device bpf prog -> devcgroup_check_permission() -> bdev_permission()
   -> bdev_file_open_by_dev(), where ERR_PTR(positive) produces a pointer
   that IS_ERR() does not catch, leading to a wild pointer dereference.

- BPF_CGROUP_SOCK: a positive value can be returned through cgroup sock
   bpf prog -> __cgroup_bpf_run_filter_sk() -> inet_create() ->
   __sock_create(), where inet_create() frees the newly allocated sk
   via sk_common_release() and sets sock->sk = NULL on the non-zero
   return, but __sock_create() only checks err < 0 for cleanup, so a
   positive retval bypasses cleanup and returns a socket with NULL sk
   to userspace, triggering a NULL pointer dereference on subsequent
   socket operations.

- BPF_CGROUP_SYSCTL: a positive value can be returned through the cgroup
   bpf prog -> __cgroup_bpf_run_filter_sysctl() -> proc_sys_call_handler(),
   where a non-zero return bypasses the normal sysctl proc_handler and is
   returned directly to userspace as return value of read() or write()
   syscall.

FYI, the following patch:
    https://lore.kernel.org/bpf/20260603105317.944304-4-dawei.feng@xxxxxxxxxx/
will change return value for BPF_CGROUP_SYSCTL from 1 to 0.


Hmm, it is a complementary fix. It updates the BPF_CGROUP_SYSCTL to use 0
as the success return value, while my patch restricts the bpf prog to only
return 0 for success or -errno for failure.

[...]