Re: [net-next PATCH v3 4/8] net: Change return type of sk_busy_loop from bool to void
From: David Miller
Date: Wed Mar 20 2019 - 15:40:14 EST
From: Christoph Paasch <christoph.paasch@xxxxxxxxx>
Date: Wed, 20 Mar 2019 11:35:31 -0700
> On Fri, Mar 24, 2017 at 3:23 PM Alexander Duyck
> <alexander.duyck@xxxxxxxxx> wrote:
>> diff --git a/net/core/datagram.c b/net/core/datagram.c
>> index ea633342ab0d..4608aa245410 100644
>> --- a/net/core/datagram.c
>> +++ b/net/core/datagram.c
>> @@ -256,8 +256,12 @@ struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned int flags,
>> }
>>
>> spin_unlock_irqrestore(&queue->lock, cpu_flags);
>> - } while (sk_can_busy_loop(sk) &&
>> - sk_busy_loop(sk, flags & MSG_DONTWAIT));
>> +
>> + if (!sk_can_busy_loop(sk))
>> + break;
>> +
>> + sk_busy_loop(sk, flags & MSG_DONTWAIT);
>> + } while (!skb_queue_empty(&sk->sk_receive_queue));
>
> since this change I am hitting stalls where it's looping in this
> while-loop with syzkaller.
>
> It worked prior to this change because sk->sk_napi_id was not set thus
> sk_busy_loop would make us get out of the loop.
>
> Now, it keeps on looping because there is an skb in the queue with
> skb->len == 0 and we are peeking with an offset, thus
> __skb_try_recv_from_queue will return NULL and thus we have no way of
> getting out of the loop.
>
> I'm not sure what would be the best way to fix it. I don't see why we
> end up with an skb in the list with skb->len == 0. So, shooting a
> quick e-mail, maybe somebody has an idea :-)
>
> I have the syzkaller-reproducer if needed.
Just for the record, __skb_try_recv_datagram() and it's friend
__skb_try_recv_from_queue() are my least favorite functions in the
entire tree for the past year or so.
Their current design, and how they assume things about the
implementation of SKB queues, together result in all the weird
problems we keep fixing in them.
There has to be a much better way to do this.