Re: [patch, rfc] lt-epoll ( level triggered epoll ) ...

From: Davide Libenzi (davidel@xmailserver.org)
Date: Fri Mar 14 2003 - 14:01:04 EST


On Fri, 14 Mar 2003, Tim Smith wrote:

> On Fri, 14 Mar 2003, Davide Libenzi wrote:
> > See, this is a free world, and I very much respect your opinion. On the
> > other side you might want to actually *read* the kqueue man page and find
> > out of its 24590 flags, where 99% of its users will use only 1% of its
> > functionality. Talking about overbloating. You might also want to know
>
> Wow...that does sound overbloated. Simpler is usually better in this kind
> of thing, because 99% of the users will be doing the same thing: a lot of
> TCP connections. From what I've seen so far, I'm very much looking forward
> to your epoll stuff.
>
> However, just for the heck of it, let me throw out a (probably stupid) idea
> for the ultimate in non-overbloated interfaces for handling a ton of TCP
> connections in the (probably most) common case of those connections all
> being to the same port. I've not looked into the kernel at all to see if
> this would actually be feasible...just speculating based on what I'd like
> as someone writing a server that I'd like to have handle 100k TCP
> connections on commodity hardware.

The only problem having 100K connections using epoll is RAM. You need to
have 100K socket buffers, 100K connection "status", ...

> How about an option to put a bound socket in a mode I'll call TCP Datagram
> Mode (TDM). You can listen() on a TDM socket. When you accept() on a TDM
> socket, you get a socket for the new connection, just like now. However,
> that socket is only used for writing to the connection.
>
> When data is available to read on the connection, instead of getting POLLIN
> on the connection socket, you get a new event on the listen socket: POLLSDG
> (SDG == "stream datagram"...generalization of "TCP Datagram"). You can then
> use recvmsg on the listen socket, and that gives you a chunk of data from
> one of the connections. The ancillary data tells you what connection the
> data is from.
>
> With this interface, plain old poll() should be good enough. For reading,
> you are only poll()ing on the listen socket. You only need to poll() on the
> write sockets if you fill up output buffers. So, most of the time, poll()
> would only be used on one socket. Even plain old poll() scales well
> to 1. :-)
>
> (Actually...it might even be reasonable to use sendmsg() on the listen
> socket to send data, too, and then get rid of the whole accept() thing for
> TDM sockets. Basically, turn multiple TCP connections into a reliable form
> of UDP from the application's point of view)

This will work too. Sadly you have to abbandon POSIX semantics for those
sockets. The idea of detaching a new socket through an accept in to have
the abstraction on one file/socket per connection/client.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Mar 15 2003 - 22:00:41 EST