Re: INFO: rcu detected stall in wg_packet_tx_worker

From: Jason A. Donenfeld
Date: Mon May 04 2020 - 07:23:33 EST


So in spite of this Syzkaller bug being unrelated in the end, I've
continued to think about the stacktrace a bit, and combined with some
other [potentially false alarm] bug reports I'm trying to wrap my head
around, I'm a bit a curious about ideal usage for the udp_tunnel API.

All the uses I've seen in the kernel (including wireguard) follow this pattern:

rcu_read_lock_bh();
sock = rcu_dereference(obj->sock);
...
udp_tunnel_xmit_skb(..., sock, ...);
rcu_read_unlock_bh();

udp_tunnel_xmit_skb calls iptunnel_xmit, which winds up in the usual
ip_local_out path, which eventually winds up calling some other
devices' ndo_xmit, or gets queued up in a qdisc. Calls to
udp_tunnel_xmit_skb aren't exactly cheap. So I wonder: is holding the
rcu lock for all that time really a good thing?

A different pattern that avoids holding the rcu lock would be:

rcu_read_lock_bh();
sock = rcu_dereference(obj->sock);
sock_hold(sock);
rcu_read_unlock_bh();
...
udp_tunnel_xmit_skb(..., sock, ...);
sock_put(sock);

This seems better, but I wonder if it has some drawbacks too. For
example, sock_put has some comment that warns against incrementing it
in response to forwarded packets. And if this isn't necessary to do,
it's marginally more costly than the first pattern.

Any opinions about this?

Jason