Re: ioevent queues (was Re: Proposed new poll2() syscall)

Richard Gooch (rgooch@atnf.CSIRO.AU)
Sun, 24 Aug 1997 15:56:05 +1000


Dean Gaudet writes:
> Warning, this is long, but I think worth it. If you've heard of NT's
> completion ports that's where I'm heading.
>
> Here's something that's more ambitious than a new poll(), but is also
> really important for scalable I/O. The problem with both poll() and
> select() is that they both require the kernel and the user to deal with a
> list of all possible events, rather than a list of exactly the events that
> have occured. In a multithreaded/multiprocess application you get into
> even worse situations where multiple workers get the same (or similar)
> list of events and then fight with each other as to who will do what.
[...]
> Now suppose you had a magic pipe that spit out structures looking like
> this:
>
> struct ioevent {
> void *user_supplied_pointer;
> };
>
> The ioevent is an indication that some asynchronous i/o event is now
> ready. It could be a socket is ready for accept, or a descriptor is ready
> for writing (or even more cool things that are difficult to multiplex in
> unix right now, such as a child has exited). The user_supplied_pointer is
> just that, a pointer that you told the kernel to associate with a
> particular descriptor (through fcntl() probably).
[...]
> Here's my suggested API:
>
> struct ioevent {
> void *ioe_user; /* user_supplied_pointer was for pedagogical purposes*/
> int ioe_result; /* system call result code */
> int ioe_errno; /* system call errno */
> };
>
> ioeventpipe (int nevents)
> - returns a descriptor to an ioevent_pipe capable of handling nevents
> outstanding, possibly uncompleted events.
>
> fcntl (fd, F_SETIOEVENTPIPE, ioeventpipe_fd, user)
> - all i/o on fd will be done asynchronously and results will be
> queued on ioeventpipe
>
> fcntl (fd, F_GETIOEVENTPIPE, &ioeventpipe_fd, &user)
> - return the associated setting
>
> New errno:
> EIOEVENTFULL possibly returned by any i/o operation on a
> descriptor which is attached to an ioeventpipe. This is
> returned when the ioeventpipe is full. In this case the
> i/o has not been scheduled.
>
> EIOEVENTQUEUED returned by any i/o operation on a descriptor which
> was successfully queued on its ioeventpipe.
>
> The system call happens asynchronously, and all buffers, or other
> structures given to it are considered "in use" because the kernel
> will complete the call and stuff the result code and errno into
> the struct ioevent. This is to eliminate the need to make the
> io call twice -- i.e. once to cause it to be queued, and then once
> to get the result.

I note you have the EIOEVENTFULL error, which is fine for the case
where the application attempts to generate I/O. What happens if you've
set up a TCP connection to feed events into the I/O pipe? Imagine a
sudden flurry of incoming data on a bunch of connections and your
event pipe fills? You would either have to drop packets or resize the
event pipe. Let me just open a hundred HTTP connections to your server
and start flooding you with requests. Your event pipe fills, you start
dropping my TCP packets, but I keep sending stuff to ensure your event
pipe stays full. This prevents other connections from nicer people who
want to use rather than abuse your server. Oops. Denial of service.

Regards,

Richard....