Re: [PATCH v2] packet: track ring entry use using a shadow ring to prevent RX ring overrun

From: Willem de Bruijn
Date: Tue May 22 2018 - 10:53:29 EST


On Tue, May 22, 2018 at 10:12 AM, Jon Rosen (jrosen) <jrosen@xxxxxxxxx> wrote:
> On Monday, May 21, 2018 2:17 PM, Jon Rosen (jrosen) <jrosen@xxxxxxxxx> wrote:
>> On Monday, May 21, 2018 1:07 PM, Willem de Bruijn
>> <willemdebruijn.kernel@xxxxxxxxx> wrote:
>>> On Mon, May 21, 2018 at 8:57 AM, Jon Rosen (jrosen) <jrosen@xxxxxxxxx> wrote:
>
> ...snip...
>
>>>
>>> A setsockopt for userspace to signal a stricter interpretation of
>>> tp_status to elide the shadow hack could then be considered.
>>> It's not pretty. Either way, no full new version is required.
>>>
>>>> As much as I would like to find a solution that doesn't require
>>>> the spin lock I have yet to do so. Maybe the answer is that
>>>> existing applications will need to suffer the performance impact
>>>> but a new version or option for TPACKET_V1/V2 could be added to
>>>> indicate strict adherence of the TP_STATUS_USER bit and then the
>>>> original diffs could be used.
>
> It looks like adding new socket options is pretty rare so I
> wonder if a better option might be to define a new TP_STATUS_XXX
> bit which would signal from a userspace application to the kernel
> that it strictly interprets the TP_STATUS_USER bit to determine
> ownership.
>
> Todays applications set tp_status = TP_STATUS_KERNEL(0) for the
> kernel to pick up the entry. We could define a new value to pass
> ownership as well as one to indicate to other kernel threads that
> an entry is inuse:
>
> #define TP_STATUS_USER_TO_KERNEL (1 << 8)
> #define TP_STATUS_INUSE (1 << 9)
>
> If the kernel sees tp_status == TP_STATUS_KERNEL then it should
> use the shadow method for tacking ownership. If it sees tp_status
> == TP_STATUS_USER_TO_KERNEL then it can use the TP_STATUS_INUSE
> method.

Interesting idea. Userspace would have to consistently follow the
new behavior for all slots, which is hard to enforce. And as this is
checked at runtime, the kernel always has to defensively allocate
the shadow memory. I do think that a per-ring option set before
ring allocation is simpler. Either way, this optimization would be a
separate follow-on patch.