Re: ioevent queues (was Re: Proposed new poll2() syscall)
Richard Gooch (rgooch@atnf.CSIRO.AU)
Sun, 24 Aug 1997 15:56:05 +1000
Dean Gaudet writes:
> Warning, this is long, but I think worth it. If you've heard of NT's
> completion ports that's where I'm heading.
> Here's something that's more ambitious than a new poll(), but is also
> really important for scalable I/O. The problem with both poll() and
> select() is that they both require the kernel and the user to deal with a
> list of all possible events, rather than a list of exactly the events that
> have occured. In a multithreaded/multiprocess application you get into
> even worse situations where multiple workers get the same (or similar)
> list of events and then fight with each other as to who will do what.
> Now suppose you had a magic pipe that spit out structures looking like
> this:
> struct ioevent {
> void *user_supplied_pointer;
> };
> The ioevent is an indication that some asynchronous i/o event is now
> ready. It could be a socket is ready for accept, or a descriptor is ready
> for writing (or even more cool things that are difficult to multiplex in
> unix right now, such as a child has exited). The user_supplied_pointer is
> just that, a pointer that you told the kernel to associate with a
> particular descriptor (through fcntl() probably).
> Here's my suggested API:
> struct ioevent {
> void *ioe_user; /* user_supplied_pointer was for pedagogical purposes*/
> int ioe_result; /* system call result code */
> int ioe_errno; /* system call errno */
> };
> ioeventpipe (int nevents)
> - returns a descriptor to an ioevent_pipe capable of handling nevents
> outstanding, possibly uncompleted events.
> fcntl (fd, F_SETIOEVENTPIPE, ioeventpipe_fd, user)
> - all i/o on fd will be done asynchronously and results will be
> queued on ioeventpipe
> fcntl (fd, F_GETIOEVENTPIPE, &ioeventpipe_fd, &user)
> - return the associated setting
> New errno:
> EIOEVENTFULL possibly returned by any i/o operation on a
> descriptor which is attached to an ioeventpipe. This is
> returned when the ioeventpipe is full. In this case the
> i/o has not been scheduled.
> EIOEVENTQUEUED returned by any i/o operation on a descriptor which
> was successfully queued on its ioeventpipe.
> The system call happens asynchronously, and all buffers, or other
> structures given to it are considered "in use" because the kernel
> will complete the call and stuff the result code and errno into
> the struct ioevent. This is to eliminate the need to make the
> io call twice -- i.e. once to cause it to be queued, and then once
> to get the result.
I note you have the EIOEVENTFULL error, which is fine for the case
where the application attempts to generate I/O. What happens if you've
set up a TCP connection to feed events into the I/O pipe? Imagine a
sudden flurry of incoming data on a bunch of connections and your
event pipe fills? You would either have to drop packets or resize the
event pipe. Let me just open a hundred HTTP connections to your server
and start flooding you with requests. Your event pipe fills, you start
dropping my TCP packets, but I keep sending stuff to ensure your event
pipe stays full. This prevents other connections from nicer people who
want to use rather than abuse your server. Oops. Denial of service.