Re: epoll_wait() performance

From: Willem de Bruijn
Date: Mon Dec 02 2019 - 11:47:45 EST


On Mon, Dec 2, 2019 at 7:24 AM David Laight <David.Laight@xxxxxxxxxx> wrote:
>
> From: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx>
> > Sent: 30 November 2019 13:30
> > On Sat, Nov 30, 2019 at 02:07 AM CET, Eric Dumazet wrote:
> > > On 11/28/19 2:17 AM, David Laight wrote:
> ...
> > >> How can you do that when all the UDP flows have different destination port numbers?
> > >> These are message flows not idempotent requests.
> > >> I don't really want to collect the packets before they've been processed by IP.
> > >>
> > >> I could write a driver that uses kernel udp sockets to generate a single message queue
> > >> than can be efficiently processed from userspace - but it is a faff compiling it for
> > >> the systems kernel version.
> > >
> > > Well if destinations ports are not under your control,
> > > you also could use AF_PACKET sockets, no need for 'UDP sockets' to receive UDP traffic,
> > > especially it the rate is small.
> >
> > Alternatively, you could steer UDP flows coming to a certain port range
> > to one UDP socket using TPROXY [0, 1].
>
> I don't think that can work, we don't really know the list of valid UDP port
> numbers ahead of time.

How about -j REDIRECT. That does not require all ports to be known
ahead of time.

> > TPROXY has the same downside as AF_PACKET, meaning that it requires at
> > least CAP_NET_RAW to create/set up the socket.
>
> CAP_NET_RAW wouldn't be a problem - we already send from a 'raw' socket.

One other issue when comparing udp and packet sockets is ip
defragmentation. That is critical code that is not at all trivial to
duplicate in userspace.

Even when choosing packet sockets, which normally would not
defragment, there is a trick. A packet socket with fanout and flag
PACKET_FANOUT_FLAG_DEFRAG will defragment before fanout.