Multicast from underlying MACVLAN interface towards MACVLAN

From: Alexander Sverdlin
Date: Tue May 12 2020 - 10:23:52 EST


Dear Network Core developers!

I've been debugging an issue with Multicast replies from underlying
interface of MACVLAN towards MACVLAN. These SKBs never contain a MAC header
and therefore cannot be properly processed by MACVLAN.

The usecase is following:
eth1 <-- eth1.212 <-- macvlan@xxxxxxxx (in bridge mode)

As I understand the problem, it actually plays no role, that there is an intermediate VLAN interface.
The problem is, if macvlan@xxxxxxxx sends Router Solicitation these SKBs are received on eth1.212,
but the corresponding multicast Router Advertisements are not received on macvlan@xxxxxxxxx

I've tracked the problem down to the following incompatibility between MACVLAN code and IP code...

One the one hand, MACVLAN always expects ethernet header:

static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)
{
struct macvlan_port *port;
struct sk_buff *skb = *pskb;
const struct ethhdr *eth = eth_hdr(skb);
...

port = macvlan_port_get_rcu(skb->dev);
if (is_multicast_ether_addr(eth->h_dest)) {

One the other hand, IP doesn't populate ethernet header for multicast loopback transmission:

int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
{
skb_reset_mac_header(skb);
__skb_pull(skb, skb_network_offset(skb));
skb->pkt_type = PACKET_LOOPBACK;
skb->ip_summed = CHECKSUM_UNNECESSARY;
WARN_ON(!skb_dst(skb));
skb_dst_force(skb);
netif_rx_ni(skb);

Unicast however works fine, because of:

int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb)
{
struct net_device *dev = neigh->dev;
unsigned int seq;
int err;

do {
__skb_pull(skb, skb_network_offset(skb));
seq = read_seqbegin(&neigh->ha_lock);
err = dev_hard_header(skb, dev, ntohs(skb->protocol),
neigh->ha, NULL, skb->len);
} while (read_seqretry(&neigh->ha_lock, seq));

if (err >= 0)
err = dev_queue_xmit(skb);

I've also collected some stack traces and SKB dumps to illustrate the problem
(I've instrumented macvlan_handle_frame() and eth_header() to understand when
the ethernet header has been generated):

macvlan_handle_frame() receives Router Advertisement, but cannot forward
without Ethernet header:

skb len=96 headroom=40 headlen=96 tailroom=56
mac=(40,0) net=(40,40) trans=80
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xae2e9a2f ip_summed=1 complete_sw=0 valid=0 level=0)
hash(0xc97ebd88 sw=1 l4=1) proto=0x86dd pkttype=5 iif=24
dev name=etha01.212 feat=0x0x0000000040005000
skb headroom: 00000000: 00 28 b3 4d 84 88 ff ff b2 72 b9 5e 00 00 00 00
skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb headroom: 00000020: 08 0f 00 00 00 00 00 00
skb linear: 00000000: 60 09 88 bd 00 38 3a ff fe 80 00 00 00 00 00 00
skb linear: 00000010: 00 40 43 ff fe 80 00 00 ff 02 00 00 00 00 00 00
skb linear: 00000020: 00 00 00 00 00 00 00 01 86 00 61 00 40 00 00 2d
skb linear: 00000030: 00 00 00 00 00 00 00 00 03 04 40 e0 00 00 01 2c
skb linear: 00000040: 00 00 00 78 00 00 00 00 fd 5f 42 68 23 87 a8 81
skb linear: 00000050: 00 00 00 00 00 00 00 00 01 01 02 40 43 80 00 00
skb tailroom: 00000000: 00 f0 01 00 00 00 00 00 a4 73 00 00 00 00 00 00
skb tailroom: 00000010: a4 73 00 00 00 00 00 00 00 10 00 00 00 00 00 00
skb tailroom: 00000020: 01 00 00 00 06 00 00 00 40 66 02 00 00 00 00 00
skb tailroom: 00000030: 40 76 02 00 00 00 00 00

Call Trace:
<IRQ>
dump_stack+0x69/0x9b
macvlan_handle_frame+0x321/0x425 [macvlan]
? macvlan_forward_source+0x110/0x110 [macvlan]
__netif_receive_skb_core+0x545/0xda0
? ip6_mc_input+0x103/0x250 [ipv6]
? ipv6_rcv+0xe1/0xf0 [ipv6]
? __netif_receive_skb_one_core+0x36/0x70
__netif_receive_skb_one_core+0x36/0x70
process_backlog+0x97/0x140
net_rx_action+0x1eb/0x350
__do_softirq+0xe3/0x383
do_softirq_own_stack+0x2a/0x40
</IRQ>
do_softirq.part.4+0x4e/0x50
netif_rx_ni+0x60/0xd0
dev_loopback_xmit+0x83/0xf0
ip6_finish_output2+0x575/0x590 [ipv6]
? ip6_cork_release.isra.1+0x64/0x90 [ipv6]
? __ip6_make_skb+0x38d/0x680 [ipv6]
? ip6_output+0x6c/0x140 [ipv6]
ip6_output+0x6c/0x140 [ipv6]
ip6_send_skb+0x1e/0x60 [ipv6]
rawv6_sendmsg+0xc4b/0xe10 [ipv6]
? proc_put_long+0xd0/0xd0
? rw_copy_check_uvector+0x4e/0x110
? sock_sendmsg+0x36/0x40
sock_sendmsg+0x36/0x40
___sys_sendmsg+0x2b6/0x2d0
? proc_dointvec+0x23/0x30
? addrconf_sysctl_forward+0x8d/0x250 [ipv6]
? dev_forward_change+0x130/0x130 [ipv6]
? _raw_spin_unlock+0x12/0x30
? proc_sys_call_handler.isra.14+0x9f/0x110
? __call_rcu+0x213/0x510
? get_max_files+0x10/0x10
? trace_hardirqs_on+0x2c/0xe0
? __sys_sendmsg+0x63/0xa0
__sys_sendmsg+0x63/0xa0
do_syscall_64+0x6c/0x1e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Later when the same RA is being transmitted neigh_connected_output(), this is the first
time Ethernet header is being generated for this packet, but this is towards "world", not
the internal MACVLAN bridge:

skb len=110 headroom=26 headlen=110 tailroom=56
mac=(-1,-1) net=(40,40) trans=80
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xae2e9a2f ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0xc97ebd88 sw=1 l4=1) proto=0x86dd pkttype=0 iif=0
dev name=etha01.212 feat=0x0x0000000040005000
sk family=10 type=3 proto=58
skb headroom: 00000000: 00 28 b3 4d 84 88 ff ff b2 72 b9 5e 00 00 00 00
skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00
skb linear: 00000000: 33 33 00 00 00 01 02 40 43 80 00 00 86 dd 60 09
skb linear: 00000010: 88 bd 00 38 3a ff fe 80 00 00 00 00 00 00 00 40
skb linear: 00000020: 43 ff fe 80 00 00 ff 02 00 00 00 00 00 00 00 00
skb linear: 00000030: 00 00 00 00 00 01 86 00 61 00 40 00 00 2d 00 00
skb linear: 00000040: 00 00 00 00 00 00 03 04 40 e0 00 00 01 2c 00 00
skb linear: 00000050: 00 78 00 00 00 00 fd 5f 42 68 23 87 a8 81 00 00
skb linear: 00000060: 00 00 00 00 00 00 01 01 02 40 43 80 00 00
skb tailroom: 00000000: 00 f0 01 00 00 00 00 00 a4 73 00 00 00 00 00 00
skb tailroom: 00000010: a4 73 00 00 00 00 00 00 00 10 00 00 00 00 00 00
skb tailroom: 00000020: 01 00 00 00 06 00 00 00 40 66 02 00 00 00 00 00
skb tailroom: 00000030: 40 76 02 00 00 00 00 00

Call Trace:
dump_stack+0x69/0x9b
debug_hdr+0x4c/0x60
eth_header+0x71/0xe0
vlan_dev_hard_header+0x58/0x140 [8021q]
neigh_connected_output+0xa9/0x100
ip6_finish_output2+0x24a/0x590 [ipv6]
? ip6_cork_release.isra.1+0x64/0x90 [ipv6]
? __ip6_make_skb+0x38d/0x680 [ipv6]
? ip6_output+0x6c/0x140 [ipv6]
ip6_output+0x6c/0x140 [ipv6]
ip6_send_skb+0x1e/0x60 [ipv6]
rawv6_sendmsg+0xc4b/0xe10 [ipv6]
? proc_put_long+0xd0/0xd0
? rw_copy_check_uvector+0x4e/0x110
? sock_sendmsg+0x36/0x40
sock_sendmsg+0x36/0x40
___sys_sendmsg+0x2b6/0x2d0
? proc_dointvec+0x23/0x30
? addrconf_sysctl_forward+0x8d/0x250 [ipv6]
? dev_forward_change+0x130/0x130 [ipv6]
? _raw_spin_unlock+0x12/0x30
? proc_sys_call_handler.isra.14+0x9f/0x110
? __call_rcu+0x213/0x510
? get_max_files+0x10/0x10
? trace_hardirqs_on+0x2c/0xe0
? __sys_sendmsg+0x63/0xa0
__sys_sendmsg+0x63/0xa0
do_syscall_64+0x6c/0x1e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe

I would appreciate any hint, how to approach this problem! I can try to come up with a patch,
but as this is so central thing in the IP protocol, I'd like to hear some opinions first...

--
Best regards,
Alexander Sverdlin.