regression with napi/softirq ?

From: Sudip Mukherjee
Date: Wed Jul 17 2019 - 16:19:32 EST


Hi All,

I am using v4.14.55 on an Intel Atom based board and I am seeing network
packet drops frequently on wireshark logs. After lots of debugging it
seems that when this happens softirq is taking huge time to start after
it has been raised. This is a small snippet from ftrace:

<...>-2110 [001] dNH1 466.634916: irq_handler_entry: irq=126 name=eth0-TxRx-0
<...>-2110 [001] dNH1 466.634917: softirq_raise: vec=3 [action=NET_RX]
<...>-2110 [001] dNH1 466.634918: irq_handler_exit: irq=126 ret=handled
ksoftirqd/1-15 [001] ..s. 466.635826: softirq_entry: vec=3 [action=NET_RX]
ksoftirqd/1-15 [001] ..s. 466.635852: softirq_exit: vec=3 [action=NET_RX]
ksoftirqd/1-15 [001] d.H. 466.635856: irq_handler_entry: irq=126 name=eth0-TxRx-0
ksoftirqd/1-15 [001] d.H. 466.635857: softirq_raise: vec=3 [action=NET_RX]
ksoftirqd/1-15 [001] d.H. 466.635858: irq_handler_exit: irq=126 ret=handled
ksoftirqd/1-15 [001] ..s. 466.635860: softirq_entry: vec=3 [action=NET_RX]
ksoftirqd/1-15 [001] ..s. 466.635863: softirq_exit: vec=3 [action=NET_RX]

So, softirq was raised at 466.634917 but it started at 466.635826 almost
909 usec after it was raised.

If I move back to v4.4 kernel I still see similar behaviour but the maximum
delay I get is in the range of 500usec. But if I move back to v3.8 kernel I
can see there is no packet loss and the maximum delay between softirq_raise
and irq_handler_entry is 103usec.

Is this a known issue?
Will really appreciate your help in this problem.


--
Regards
Sudip