Re: [PATCH bpf v3 1/2] bpf: fix wrong copied_seq calculation

From: Jiayuan Chen
Date: Tue Dec 24 2024 - 02:18:28 EST


On Mon, Dec 23, 2024 at 11:57:58PM +0800, Jakub Sitnicki wrote:
> On Mon, Dec 23, 2024 at 09:57 PM +01, Jakub Sitnicki wrote:
> > On Thu, Dec 19, 2024 at 05:30 PM +08, Jiayuan Chen wrote:
> >> Currently, not all modules using strparser have issues with
> >> copied_seq miscalculation. The issue exists mainly with
> >> bpf::sockmap + strparser because bpf::sockmap implements a
> >> proprietary read interface for user-land: tcp_bpf_recvmsg_parser().
> >>
> >> Both this and strp_recv->tcp_read_sock update copied_seq, leading
> >> to errors.
> >>
> >> This is why I rewrote the tcp_read_sock() interface specifically for
> >> bpf::sockmap.
> >
> > All right. Looks like reusing read_skb is not going to pan out.
> >
> > But I think we should not give up just yet. It's easy to add new code.
> >
> > We can try to break up and parametrize tcp_read_sock - if other
> > maintainers are not against it. Does something like this work for you?
> >
> > https://github.com/jsitnicki/linux/commits/review/stp-copied_seq/idea-2/
>
> Actually it reads better if we just add early bailout to tcp_read_sock:
>
> https://github.com/jsitnicki/linux/commits/review/stp-copied_seq/idea-2.1/
>
> ---8<---
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 6a07d98017f7..6564ea3b6cd4 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1565,12 +1565,13 @@ EXPORT_SYMBOL(tcp_recv_skb);
> * or for 'peeking' the socket using this routine
> * (although both would be easy to implement).
> */
> -int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
> - sk_read_actor_t recv_actor)
> +static inline int __tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
> + sk_read_actor_t recv_actor, bool noack,
> + u32 *copied_seq)
> {
> struct sk_buff *skb;
> struct tcp_sock *tp = tcp_sk(sk);
> - u32 seq = tp->copied_seq;
> + u32 seq = *copied_seq;
> u32 offset;
> int copied = 0;
>
> @@ -1624,9 +1625,12 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
> tcp_eat_recv_skb(sk, skb);
> if (!desc->count)
> break;
> - WRITE_ONCE(tp->copied_seq, seq);
> + WRITE_ONCE(*copied_seq, seq);
> }
> - WRITE_ONCE(tp->copied_seq, seq);
> + WRITE_ONCE(*copied_seq, seq);
> +
> + if (noack)
> + goto out;
>
> tcp_rcv_space_adjust(sk);
>
> @@ -1635,10 +1639,25 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
> tcp_recv_skb(sk, seq, &offset);
> tcp_cleanup_rbuf(sk, copied);
> }
> +out:
> return copied;
> }
> +
> +int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
> + sk_read_actor_t recv_actor)
> +{
> + return __tcp_read_sock(sk, desc, recv_actor, false,
> + &tcp_sk(sk)->copied_seq);
> +}
> EXPORT_SYMBOL(tcp_read_sock);
>
> +int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc,
> + sk_read_actor_t recv_actor, u32 *copied_seq)
> +{
> + return __tcp_read_sock(sk, desc, recv_actor, true, copied_seq);
> +}
> +EXPORT_SYMBOL(tcp_read_sock_noack);
> +
> int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
> {
> struct sk_buff *skb;

This modification definitely reduces code duplication and makes it more
elegant compared to my previous idea. Also If we want to avoid modifying
the strp code and not introduce new ops, perhaps we could revert to the
simplest solution:
'''
void sk_psock_start_strp(struct sock *sk, struct sk_psock *psock)
{
...
sk->sk_data_ready = sk_psock_strp_data_ready;
/* Replacement */
psock->saved_read_sock = sk->sk_socket->ops->read_sock;
sk->sk_socket->ops->read_sock = tcp_read_sock_noack;
}
'''
If acceptable, I can incorporate this approach in the next patch version.

BTW, It seems CI run checkpatch.pl with '--strict' argument so I lost few
of warnings compare to CI, will fix it in next revision.