Re: Optimising poll(2)

Richard Gooch (rgooch@atnf.CSIRO.AU)
Sat, 23 Aug 1997 23:04:09 +1000


Martin von Loewis writes:
> > What I propose would change the way polling is done in the kernel,
> > which of course involves a fair bit of work, but is hopefully worth
> > it. The basic idea is to define a new field (called, say,
> > "poll_events") inside the "struct file" structure which contains a
> > poll event mask, just like what you get back from poll(). It would be
> > the responsibility of each driver to add bits like POLLIN when data is
> > available for reading and POLLOUT when data can be written without
> > blocking, and of course all the other defined bits.
>
> This sounds like a reasonable idea, and I like it much more than your
> original poll2(2) proposal.

I still maintain that a poll2() syscall along the lines of what I
suggested is worthwile. As my timing tests have indicated, we can
bring down the zero-timeout cost of poll(2) to under 1 millisecond on
a Pentium 100 for 1021 descriptors. Without poll2(2) the application
*still* has to scan the entire list, which would take over 230
microseconds (1021 descriptors). This is around one third of the time
now taken for poll(2), which is becomming even harder to ignore.
The point to poll2(2) is that the kernel has already *done* the work:
why not make use of it?

I don't understand what it is about poll2(2) that you dislike: just
the idea of another syscall? Because it's not POSIX? Something else?

> As you noticed, it might require a lot of changes. That could
> be reduced if you add a flag to struct file that tells whether the
> underlying driver supports synchronizing the field at all. Then you
> could go through the files and see whether you can trust this field,
> and call the poll operation otherwise.

Yes, I had thought of this. If this were to be done, we would
definately want to ease the transition. I haven't looked yet at how
file structures are allocated: we would have to make sure the flag
would always be initialised. If there is a single function which
allocates them, great. Otherwise some care will need to be taken.
I first wanted to see what people thought about the general idea.

> With this interface, you could then implement that improved polling
> for the most interesting case: TCP sockets. If you get a significant
> speed-up on this, other driver authors could consider following your
> interface.

I think 850 microseconds for 1021 descriptors, down from 4 900 and
2 900 microseconds is pretty significant :-)

Regards,

Richard....