With small changes to the existing model, we can get significant
speedups. A problem I see with ioevents is it's a quite different
model.
> select() is O(n) where n is the max descriptor. poll() is O(m) where m is
> the number of descriptors. Both are linear in their inputs ... m <= n...
> poll is generally more efficient at the kernel level since select is
> implemented in terms of poll. Even in the cases where the outputs are
> sparse and find_first_zero_bit becomes a win, you still had to pay for
> O(n) operations inside select(). So timing things at the user level only
> is totally bogus.
I sent the following analysis offline to someone else, but it's
probably of wider interest:
Let's compare the costs:
select(2):
- user-space copy of fd_sets (small copy)
- kernel-space iteration for each fd to do __get_user for each event
type: hey! that's *3* scans
- kernel-space clearing of fd_set_buffer (3 small clears)
- kernel-space iteration for each fd to make poll masks with 3 bit
manipulations
- kernel-space iteration for each fd to do __put_user for each event
type: another 3 scans
- user-space first-first/find-next scan to find who's active (fast scan)
I count: 7 full scans of the fd list
1 small copy
3 small clears
1 fast scan
poll(2):
- kernel-space copy_from_user of pollfds
- kernel-space iteration for each fd to manipulate poll mask
- kernel-space iteration for each fd to do __put_user
- user-space iteration for each fd to find who-s active
I count: 3 full scans of the fd list
1 full copy
poll2(2):
- kernel-space copy_from_user of pollfds
- kernel-space iteration for each fd to manipulate poll mask
- kernel-space iteration for each ready fd to do __put_user
- user-space iteration for each ready fd to find who-s active
I count: 1 full scan of the fd list
1 tiny scan/copy of ready fds
1 tiny scan of ready fds
Regards,
Richard....