Re: [PATCH] bpf, sockmap: defer sk_psock_free_link() using RCU

From: Alexei Starovoitov
Date: Tue May 21 2024 - 11:39:19 EST


On Sun, May 12, 2024 at 12:22 AM Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> If a BPF program is attached to kfree() event, calling kfree()
> with psock->link_lock held triggers lockdep warning.
>
> Defer kfree() using RCU so that the attached BPF program runs
> without holding psock->link_lock.
>
> Reported-by: syzbot+ec941d6e24f633a59172@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://syzkaller.appspot.com/bug?extid=ec941d6e24f633a59172
> Tested-by: syzbot+ec941d6e24f633a59172@xxxxxxxxxxxxxxxxxxxxxxxxx
> Reported-by: syzbot+a4ed4041b9bea8177ac3@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://syzkaller.appspot.com/bug?extid=a4ed4041b9bea8177ac3
> Tested-by: syzbot+a4ed4041b9bea8177ac3@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> ---
> include/linux/skmsg.h | 7 +++++--
> net/core/skmsg.c | 2 ++
> net/core/sock_map.c | 2 ++
> 3 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index a509caf823d6..66590f20b777 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -66,7 +66,10 @@ enum sk_psock_state_bits {
> };
>
> struct sk_psock_link {
> - struct list_head list;
> + union {
> + struct list_head list;
> + struct rcu_head rcu;
> + };
> struct bpf_map *map;
> void *link_raw;
> };
> @@ -418,7 +421,7 @@ static inline struct sk_psock_link *sk_psock_init_link(void)
>
> static inline void sk_psock_free_link(struct sk_psock_link *link)
> {
> - kfree(link);
> + kfree_rcu(link, rcu);
> }
>
> struct sk_psock_link *sk_psock_link_pop(struct sk_psock *psock);
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..9cebfeecd3c9 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -791,10 +791,12 @@ static void sk_psock_link_destroy(struct sk_psock *psock)
> {
> struct sk_psock_link *link, *tmp;
>
> + rcu_read_lock();
> list_for_each_entry_safe(link, tmp, &psock->link, list) {
> list_del(&link->list);
> sk_psock_free_link(link);
> }
> + rcu_read_unlock();
> }
>
> void sk_psock_stop(struct sk_psock *psock)
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 8598466a3805..8bec4b7a8ec7 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -142,6 +142,7 @@ static void sock_map_del_link(struct sock *sk,
> bool strp_stop = false, verdict_stop = false;
> struct sk_psock_link *link, *tmp;
>
> + rcu_read_lock();
> spin_lock_bh(&psock->link_lock);

I think this is incorrect.
spin_lock_bh may sleep in RT and it won't be safe to do in rcu cs.

pw-bot: cr

> list_for_each_entry_safe(link, tmp, &psock->link, list) {
> if (link->link_raw == link_raw) {
> @@ -159,6 +160,7 @@ static void sock_map_del_link(struct sock *sk,
> }
> }
> spin_unlock_bh(&psock->link_lock);
> + rcu_read_unlock();
> if (strp_stop || verdict_stop) {
> write_lock_bh(&sk->sk_callback_lock);
> if (strp_stop)
> --
> 2.34.1
>