Re: [RFC PATCH] packet: mark ring entry as in-use inside spin_lock to prevent RX ring overrun

From: Willem de Bruijn
Date: Wed Apr 04 2018 - 09:50:00 EST

On Tue, Apr 3, 2018 at 11:55 PM, Jon Rosen <jrosen@xxxxxxxxx> wrote:
> Fix PACKET_RX_RING bug for versions TPACKET_V1 and TPACKET_V2 which
> casues the ring to get corrupted by allowing multiple kernel threads
> to claim ownership of the same ring entry, Mark the ring entry as
> already being used within the spin_lock to prevent other kernel
> threads from reusing the same entry before it's fully filled in,
> passed to user space, and then eventually passed back to the kernel
> for use with a new packet.
> Note that the proposed change may modify the semantics of the
> interface between kernel space and user space in a way which may cause
> some applications to no longer work properly.

As long as TP_STATUS_USER (1) is not set, userspace should ignore
the slot..

> One issue with the above proposed change to use TP_STATUS_IN_PROGRESS
> is that the documentation of the tp_status field is somewhat
> inconsistent. In some places it's described as TP_STATUS_KERNEL(0)
> meaning the entry is owned by the kernel and !TP_STATUS_KERNEL(0)
> meaning the entry is owned by user space. In other places ownership
> by user space is defined by the TP_STATUS_USER(1) bit being set.

But indeed this example in packet_mmap.txt is problematic

if (status == TP_STATUS_KERNEL)
retval = poll(&pfd, 1, timeout);

It does not really matter whether the docs are possibly inconsistent and
which one is authoritative. Examples like the above make it likely that
some user code expects such code to work.

> +++ b/net/packet/af_packet.c
> @@ -2287,6 +2287,15 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> if (po->stats.stats1.tp_drops)
> status |= TP_STATUS_LOSING;
> }
> +
> + /*
> + * Mark this entry as TP_STATUS_IN_PROGRESS to prevent other
> + * kernel threads from re-using this same entry.
> + */

No need to reinterpret existing flags. tp_status is a u32 with
sufficient undefined bits.

> + if (po->tp_version <= TPACKET_V2)
> + __packet_set_status(po, h.raw, TP_STATUS_IN_PROGRESS);
> +
> po->stats.stats1.tp_packets++;
> if (copy_skb) {
> status |= TP_STATUS_COPY;
> --
> 2.10.3.dirty