process_backlog interruptions with 3.10.47-rt50

From: trevor_davenport
Date: Tue Feb 10 2015 - 18:32:19 EST


I've recently encountered a problem after upgrading from 3.0.57-rt82 to
3.10.47-rt50 where process_backlog gets interrupted and does not resume
for a while, which results in packets not being processed in time. I see
net_rx_action, which then calls process_backlog (as the poll method to
process the backlog of packets queued up the netif_rx) but then after the
interruption, it does not finish for about 5ms. In the older kernel it
would finish based on the priority of ksoftirqd. This is no longer the
case.

I have priorities configured so that hard interrupts are highest,
ksoftirqd next (both are SCHED_FIFO) and then my program is currently
SCHED_OTHER but I still do not see the rx softirq finish before my program
runs.

This is all on a single core powerpc device. I do not see these problems
with a net device which uses NAPI directly (as such i'm updating my driver
to use NAPI) but it seems like there is a real bug here somewhere. I have
not been able to find any mention of similar problems (perhaps few people
are using netif_rx these days).

I've attached a recording from perf which shows the problem. Specifically,
you see net_rx_action run at time 213.079014 and then it doesn't finish
until about 5ms later at time 213.084953 which i not the case on the older
kernels. It seems something has changed with softirq handling or
process_backlog needed adapted for it. My suspicion is this has something
to do with the work mentioned in 210dc110063cf040d3209fddf766f6fcafccdc34
but I'm not an expert with this area of the kernel.

Any tips or suggestions would be appreciated. Please CC me as I am not
subscribed to lkml.

Thanks,
Trevor



ethernet_idle 3065 [000] 213.078439: printk:console: [ 213.078426] send db
ethernet_idle 3065 [000] 213.078458: printk:console: [ 213.078449] ether_device xmit
ethernet_idle 3065 [000] 213.078470: net:net_dev_queue: dev=eth2 skbaddr=(nil) len=60
ethernet_idle 3065 [000] 213.078492: printk:console: [ 213.078482] enet xmit
ethernet_idle 3065 [000] 213.078524: irq:irq_handler_entry: irq=21 name=sel_enet_mac
ethernet_idle 3065 [000] 213.078532: irq:irq_handler_exit: irq=21 ret=handled
ethernet_idle 3065 [000] 213.078552: net:net_dev_xmit: dev=eth2 skbaddr=(nil) len=60 rc=0
ethernet_idle 3065 [000] 213.078569: sched:sched_switch: prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=R+ ==> next_comm=irq/21-sel_enet next_pid=3058 next_prio=49
irq/21-sel_enet 3058 [000] 213.078603: printk:console: [ 213.078586] enet rx
irq/21-sel_enet 3058 [000] 213.078629: net:netif_rx: dev=eth2 skbaddr=(nil) len=46
irq/21-sel_enet 3058 [000] 213.078637: irq:softirq_raise: vec=3 [action=NET_RX]
irq/21-sel_enet 3058 [000] 213.078655: printk:console: [ 213.078645] enet tx
irq/21-sel_enet 3058 [000] 213.078673: irq:softirq_entry: vec=3 [action=NET_RX]
irq/21-sel_enet 3058 [000] 213.078691: printk:console: [ 213.078680] process_backlog entry
irq/21-sel_enet 3058 [000] 213.078702: net:netif_receive_skb: dev=eth2 skbaddr=(nil) len=46
irq/21-sel_enet 3058 [000] 213.078726: printk:console: [ 213.078711] ether_device recv
irq/21-sel_enet 3058 [000] 213.078736: printk:console: [ 213.078716] process_backlog exit 1
irq/21-sel_enet 3058 [000] 213.078747: napi:napi_poll: napi poll on napi struct (nil) for device (no_device)
irq/21-sel_enet 3058 [000] 213.078754: irq:softirq_exit: vec=3 [action=NET_RX]
irq/21-sel_enet 3058 [000] 213.078779: sched:sched_switch: prev_comm=irq/21-sel_enet prev_pid=3058 prev_prio=49 prev_state=S ==> next_comm=database next_pid=3224 next_prio=120
database 3224 [000] 213.078965: net:net_dev_queue: dev=lo skbaddr=(nil) len=389
database 3224 [000] 213.078988: net:netif_rx: dev=lo skbaddr=(nil) len=375
database 3224 [000] 213.078996: irq:softirq_raise: vec=3 [action=NET_RX]
database 3224 [000] 213.079005: net:net_dev_xmit: dev=lo skbaddr=(nil) len=389 rc=0
database 3224 [000] 213.079014: irq:softirq_entry: vec=3 [action=NET_RX]
database 3224 [000] 213.079052: printk:console: [ 213.079026] process_backlog entry
database 3224 [000] 213.079066: net:netif_receive_skb: dev=lo skbaddr=(nil) len=375
database 3224 [000] 213.079152: net:net_dev_queue: dev=lo skbaddr=(nil) len=66
database 3224 [000] 213.079204: irq:softirq_raise: vec=1 [action=TIMER]
database 3224 [000] 213.079263: sched:sched_switch: prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=3 next_prio=2
ksoftirqd/0 3 [000] 213.079284: irq:softirq_entry: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.079302: irq:softirq_exit: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.079322: sched:sched_switch: prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=cat next_pid=3197 next_prio=120
cat 3197 [000] 213.079489: sched:sched_switch: prev_comm=cat prev_pid=3197 prev_prio=120 prev_state=S ==> next_comm=grep next_pid=3198 next_prio=120
grep 3198 [000] 213.079638: sched:sched_switch: prev_comm=grep prev_pid=3198 prev_prio=120 prev_state=S ==> next_comm=ethernet_idle next_pid=3065 next_prio=120
ethernet_idle 3065 [000] 213.079672: sched:sched_switch: prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=S ==> next_comm=database next_pid=3224 next_prio=120
database 3224 [000] 213.079694: net:netif_rx: dev=lo skbaddr=(nil) len=52
database 3224 [000] 213.079702: irq:softirq_raise: vec=3 [action=NET_RX]
database 3224 [000] 213.079711: net:net_dev_xmit: dev=lo skbaddr=(nil) len=66 rc=0
database 3224 [000] 213.079739: sched:sched_switch: prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==> next_comm=dbpool next_pid=3217 next_prio=120
dbpool 3217 [000] 213.079770: sched:sched_switch: prev_comm=dbpool prev_pid=3217 prev_prio=120 prev_state=D ==> next_comm=db_ctl next_pid=3225 next_prio=120
db_ctl 3225 [000] 213.080213: irq:softirq_raise: vec=1 [action=TIMER]
db_ctl 3225 [000] 213.080270: sched:sched_switch: prev_comm=db_ctl prev_pid=3225 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=3 next_prio=2
ksoftirqd/0 3 [000] 213.080294: irq:softirq_entry: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.080312: irq:softirq_exit: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.080332: sched:sched_switch: prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=ethernet_idle next_pid=3065 next_prio=120
ethernet_idle 3065 [000] 213.080382: printk:console: [ 213.080353] ether_device poll
ethernet_idle 3065 [000] 213.080443: printk:console: [ 213.080430] send dd
ethernet_idle 3065 [000] 213.080464: printk:console: [ 213.080454] ether_device xmit
ethernet_idle 3065 [000] 213.080476: net:net_dev_queue: dev=eth2 skbaddr=(nil) len=60
ethernet_idle 3065 [000] 213.080498: printk:console: [ 213.080487] enet xmit
ethernet_idle 3065 [000] 213.080530: irq:irq_handler_entry: irq=21 name=sel_enet_mac
ethernet_idle 3065 [000] 213.080539: irq:irq_handler_exit: irq=21 ret=handled
ethernet_idle 3065 [000] 213.080560: net:net_dev_xmit: dev=eth2 skbaddr=(nil) len=60 rc=0
ethernet_idle 3065 [000] 213.080577: sched:sched_switch: prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=R+ ==> next_comm=irq/21-sel_enet next_pid=3058 next_prio=49
irq/21-sel_enet 3058 [000] 213.080611: printk:console: [ 213.080595] enet rx
irq/21-sel_enet 3058 [000] 213.080637: net:netif_rx: dev=eth2 skbaddr=(nil) len=46
irq/21-sel_enet 3058 [000] 213.080656: printk:console: [ 213.080645] enet tx
irq/21-sel_enet 3058 [000] 213.080687: sched:sched_switch: prev_comm=irq/21-sel_enet prev_pid=3058 prev_prio=49 prev_state=S ==> next_comm=ethernet_idle next_pid=3065 next_prio=120
ethernet_idle 3065 [000] 213.080714: sched:sched_switch: prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=S ==> next_comm=database next_pid=3224 next_prio=120
database 3224 [000] 213.080747: sched:sched_switch: prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==> next_comm=dbpool next_pid=3217 next_prio=120
dbpool 3217 [000] 213.080801: sched:sched_switch: prev_comm=dbpool prev_pid=3217 prev_prio=120 prev_state=D ==> next_comm=database next_pid=3224 next_prio=120
database 3224 [000] 213.080825: sched:sched_switch: prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==> next_comm=dbpool next_pid=3217 next_prio=120
dbpool 3217 [000] 213.081210: irq:softirq_raise: vec=1 [action=TIMER]
dbpool 3217 [000] 213.081264: sched:sched_switch: prev_comm=dbpool prev_pid=3217 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=3 next_prio=2
ksoftirqd/0 3 [000] 213.081286: irq:softirq_entry: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.081311: irq:softirq_exit: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.081331: sched:sched_switch: prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=dbcl next_pid=3222 next_prio=120
dbcl 3222 [000] 213.081581: sched:sched_switch: prev_comm=dbcl prev_pid=3222 prev_prio=120 prev_state=S ==> next_comm=cat next_pid=3197 next_prio=120
cat 3197 [000] 213.081733: sched:sched_switch: prev_comm=cat prev_pid=3197 prev_prio=120 prev_state=S ==> next_comm=grep next_pid=3198 next_prio=120
grep 3198 [000] 213.081853: sched:sched_switch: prev_comm=grep prev_pid=3198 prev_prio=120 prev_state=S ==> next_comm=dbpool next_pid=3217 next_prio=120
dbpool 3217 [000] 213.082210: irq:softirq_raise: vec=1 [action=TIMER]
dbpool 3217 [000] 213.082264: sched:sched_switch: prev_comm=dbpool prev_pid=3217 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=3 next_prio=2
ksoftirqd/0 3 [000] 213.082286: irq:softirq_entry: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.082297: irq:softirq_exit: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.082316: sched:sched_switch: prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=dbcl next_pid=3222 next_prio=120
dbcl 3222 [000] 213.082595: sched:sched_switch: prev_comm=dbcl prev_pid=3222 prev_prio=120 prev_state=S ==> next_comm=dbpool next_pid=3217 next_prio=120
dbpool 3217 [000] 213.082937: sched:sched_switch: prev_comm=dbpool prev_pid=3217 prev_prio=120 prev_state=S ==> next_comm=dbcl next_pid=3222 next_prio=120
dbcl 3222 [000] 213.083207: irq:softirq_raise: vec=1 [action=TIMER]
dbcl 3222 [000] 213.083260: sched:sched_switch: prev_comm=dbcl prev_pid=3222 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=3 next_prio=2
ksoftirqd/0 3 [000] 213.083281: irq:softirq_entry: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.083291: irq:softirq_exit: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.083310: sched:sched_switch: prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=dbcl next_pid=3222 next_prio=120
dbcl 3222 [000] 213.083455: sched:sched_switch: prev_comm=dbcl prev_pid=3222 prev_prio=120 prev_state=S ==> next_comm=dbpool next_pid=3217 next_prio=120
dbpool 3217 [000] 213.084211: irq:softirq_raise: vec=1 [action=TIMER]
dbpool 3217 [000] 213.084255: net:net_dev_queue: dev=lo skbaddr=(nil) len=81
dbpool 3217 [000] 213.084274: sched:sched_switch: prev_comm=dbpool prev_pid=3217 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=3 next_prio=2
ksoftirqd/0 3 [000] 213.084297: irq:softirq_entry: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.084306: irq:softirq_exit: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.084325: sched:sched_switch: prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=dbpool next_pid=3217 next_prio=120
dbpool 3217 [000] 213.084345: net:netif_rx: dev=lo skbaddr=(nil) len=67
dbpool 3217 [000] 213.084354: net:net_dev_xmit: dev=lo skbaddr=(nil) len=81 rc=0
dbpool 3217 [000] 213.084495: sched:sched_switch: prev_comm=dbpool prev_pid=3217 prev_prio=120 prev_state=S ==> next_comm=ethernet_idle next_pid=3065 next_prio=120
ethernet_idle 3065 [000] 213.084558: printk:console: [ 213.084528] ether_device poll
ethernet_idle 3065 [000] 213.084591: printk:console: [ 213.084581] WARNING 0: 1 datagram timed out!
ethernet_idle 3065 [000] 213.084650: printk:console: [ 213.084639] send df
ethernet_idle 3065 [000] 213.084668: printk:console: [ 213.084659] ether_device xmit
ethernet_idle 3065 [000] 213.084680: net:net_dev_queue: dev=eth2 skbaddr=(nil) len=60
ethernet_idle 3065 [000] 213.084702: printk:console: [ 213.084692] enet xmit
ethernet_idle 3065 [000] 213.084734: irq:irq_handler_entry: irq=21 name=sel_enet_mac
ethernet_idle 3065 [000] 213.084742: irq:irq_handler_exit: irq=21 ret=handled
ethernet_idle 3065 [000] 213.084763: net:net_dev_xmit: dev=eth2 skbaddr=(nil) len=60 rc=0
ethernet_idle 3065 [000] 213.084780: sched:sched_switch: prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=R+ ==> next_comm=irq/21-sel_enet next_pid=3058 next_prio=49
irq/21-sel_enet 3058 [000] 213.084815: printk:console: [ 213.084798] enet rx
irq/21-sel_enet 3058 [000] 213.084841: net:netif_rx: dev=eth2 skbaddr=(nil) len=46
irq/21-sel_enet 3058 [000] 213.084860: printk:console: [ 213.084849] enet tx
irq/21-sel_enet 3058 [000] 213.084892: sched:sched_switch: prev_comm=irq/21-sel_enet prev_pid=3058 prev_prio=49 prev_state=S ==> next_comm=database next_pid=3224 next_prio=120
database 3224 [000] 213.084925: printk:console: [ 213.084910] process_backlog exit 1
database 3224 [000] 213.084936: napi:napi_poll: napi poll on napi struct (nil) for device (no_device)
database 3224 [000] 213.084945: irq:softirq_raise: vec=3 [action=NET_RX]
database 3224 [000] 213.084953: irq:softirq_exit: vec=3 [action=NET_RX]
database 3224 [000] 213.084962: irq:softirq_entry: vec=3 [action=NET_RX]
database 3224 [000] 213.084979: printk:console: [ 213.084969] process_backlog entry
database 3224 [000] 213.084990: net:netif_receive_skb: dev=lo skbaddr=(nil) len=52
database 3224 [000] 213.085025: net:netif_receive_skb: dev=eth2 skbaddr=(nil) len=46
database 3224 [000] 213.085045: printk:console: [ 213.085032] ether_device recv
database 3224 [000] 213.085057: net:netif_receive_skb: dev=lo skbaddr=(nil) len=67
database 3224 [000] 213.085072: net:netif_receive_skb: dev=eth2 skbaddr=(nil) len=46
database 3224 [000] 213.085090: printk:console: [ 213.085078] ether_device recv
database 3224 [000] 213.085109: printk:console: [ 213.085099] process_backlog exit 1
database 3224 [000] 213.085118: napi:napi_poll: napi poll on napi struct (nil) for device (no_device)
database 3224 [000] 213.085124: irq:softirq_exit: vec=3 [action=NET_RX]
database 3224 [000] 213.085199: irq:softirq_raise: vec=1 [action=TIMER]
database 3224 [000] 213.085247: sched:sched_switch: prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=3 next_prio=2
ksoftirqd/0 3 [000] 213.085268: irq:softirq_entry: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.085284: irq:softirq_exit: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.085304: sched:sched_switch: prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=cat next_pid=3197 next_prio=120
cat 3197 [000] 213.085488: sched:sched_switch: prev_comm=cat prev_pid=3197 prev_prio=120 prev_state=S ==> next_comm=grep next_pid=3198 next_prio=120
grep 3198 [000] 213.085656: sched:sched_switch: prev_comm=grep prev_pid=3198 prev_prio=120 prev_state=S ==> next_comm=ethernet_idle next_pid=3065 next_prio=120
ethernet_idle 3065 [000] 213.085689: sched:sched_switch: prev_comm=ethernet_idle prev_pid=3065 prev_prio=120 prev_state=S ==> next_comm=database next_pid=3224 next_prio=120
database 3224 [000] 213.085812: net:net_dev_queue: dev=lo skbaddr=(nil) len=66
database 3224 [000] 213.085835: net:netif_rx: dev=lo skbaddr=(nil) len=52
database 3224 [000] 213.085843: irq:softirq_raise: vec=3 [action=NET_RX]
database 3224 [000] 213.085852: net:net_dev_xmit: dev=lo skbaddr=(nil) len=66 rc=0
database 3224 [000] 213.085866: irq:softirq_entry: vec=3 [action=NET_RX]
database 3224 [000] 213.085905: printk:console: [ 213.085878] process_backlog entry
database 3224 [000] 213.085920: net:netif_receive_skb: dev=lo skbaddr=(nil) len=52
database 3224 [000] 213.086000: printk:console: [ 213.085979] process_backlog exit 1
database 3224 [000] 213.086016: napi:napi_poll: napi poll on napi struct (nil) for device (no_device)
database 3224 [000] 213.086024: irq:softirq_exit: vec=3 [action=NET_RX]
database 3224 [000] 213.086206: irq:softirq_raise: vec=1 [action=TIMER]
database 3224 [000] 213.086259: sched:sched_switch: prev_comm=database prev_pid=3224 prev_prio=120 prev_state=R+ ==> next_comm=ksoftirqd/0 next_pid=3 next_prio=2
ksoftirqd/0 3 [000] 213.086281: irq:softirq_entry: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.086303: irq:softirq_exit: vec=1 [action=TIMER]
ksoftirqd/0 3 [000] 213.086323: sched:sched_switch: prev_comm=ksoftirqd/0 prev_pid=3 prev_prio=2 prev_state=S ==> next_comm=cat next_pid=3197 next_prio=120
cat 3197 [000] 213.086439: sched:sched_switch: prev_comm=cat prev_pid=3197 prev_prio=120 prev_state=S ==> next_comm=grep next_pid=3198 next_prio=120
grep 3198 [000] 213.086539: sched:sched_switch: prev_comm=grep prev_pid=3198 prev_prio=120 prev_state=S ==> next_comm=ethernet_idle next_pid=3065 next_prio=120
ethernet_idle 3065 [000] 213.086593: printk:console: [ 213.086565] ether_device poll
ethernet_idle 3065 [000] 213.086622: printk:console: [ 213.086612] unmatched dd