Re: [PATCH net-next v4 4/4] net: tun: track dropped skb via kfree_skb_reason()
From: David Ahern
Date: Tue Mar 01 2022 - 22:29:45 EST
On 3/1/22 7:50 PM, Jakub Kicinski wrote:
> On Sat, 26 Feb 2022 00:49:29 -0800 Dongli Zhang wrote:
>> + SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */
>> + SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */
>
> IDK if these are not too low level and therefore lacking meaning.
>
> What are your thoughts David?
I agree. Not every kfree_skb is worthy of a reason. "Internal
housekeeping" errors are random and nothing a user / admin can do about
drops.
IMHO, the value of the reason code is when it aligns with SNMP counters
(original motivation for this direction) and relevant details like TCP
or UDP checksum mismatch, packets for a socket that is not open, socket
is full, ring buffer is full, packets for "other host", etc.
>
> Would it be better to up level the names a little bit and call SKB_PULL
> something like "HDR_TRUNC" or "HDR_INV" or "HDR_ERR" etc or maybe
> "L2_HDR_ERR" since in this case we seem to be pulling off ETH_HLEN?
>
> For SKB_TRIM the error comes from allocation failures, there may be
> a whole bunch of skb helpers which will fail only under mem pressure,
> would it be better to identify them and return some ENOMEM related
> reason, since, most likely, those will be noise to whoever is tracking
> real errors?
>
>> SKB_DROP_REASON_DEV_HDR, /* there is something wrong with
>> * device driver specific header
>> */
>> + SKB_DROP_REASON_DEV_READY, /* device is not ready */
>
> What is ready? link is not up? peer not connected? can we expand?
As I recall in this case it is the tfile for a tun device disappeared -
ie., a race condition.