Re: [PATCH] openvswitch: Orphan frags before sending to userspace via Netlink to avoid guest stall

From: Thomas Graf
Date: Fri Mar 07 2014 - 13:05:35 EST


On 03/07/2014 06:19 PM, Pravin Shelar wrote:
On Fri, Mar 7, 2014 at 7:58 AM, Thomas Graf <tgraf@xxxxxxxxxx> wrote:
On 03/07/2014 05:46 AM, Pravin Shelar wrote:

But I found bug in datapath user-space queue code. I am not sure how
this can work with skb fragments and MMAP-netlink socket.
Here is what happens, OVS allocates netlink skb and adds fragments to
skb using skb_zero_copy(), then calls genlmsg_unicast().
But if netlink sock is mmped then netlink-send queues netlink
allocated skb->head (linear data of skb) and ignore skb frags.

Currently this is not problem with OVS vswitchd since it does not use
netlink MMAP sockets. But if vswitchd stats using MMAP-netlink socket,
it can break it.


The secret is out ;-)

I was very surprised too when I noticed that it worked. It's not just
OVS, it's nfqueue as well. The reason is that an netlink mmaped skb is
setup with a giant tailroom in netlink_ring_setup_skb():

skb->end = skb->tail + size;

For OVS use-case, the size is linear part of skb. so I think for
mmap-netlink socket it will fail.

Could you rephrase? I'm not sure I understand correctly.

The tailroom size equals to the configured frame payload size of
the ring buffer. So as long as the frame size chosen is large
enough to hold whatever pieces comes out of skb_gso_segment() we are
fine. That said, I agree that we should fix this properly before we
enable mmap on the OVS user space side.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/