Hence my internal optimisations are my first priority. I have no
intention of submitting a patch for poll2(2) before the major
bottlenecks are removed. For now I'm just testing poll2(2) for my own
interest, to see how it improves as these other issues are fixed.
> In this case you claimed that "select" showed a bottleneck somewhere.
> You supported your claim by submitting an awfully inefficient piece of
> code.
>
> In general you've already made up your mind before you start
> benchmarking, and it is hard to make the enemy look good by writing
> good code for the "enemy".
>
> Now it seems that you are finally running the right benchmarks, and
> that is a good thing.
That sounds a tad defensive. Pardon me for evolving my benchmarks as I
investigate where all the different bottlenecks are (and remove them).
> Doing the right thing for the wrong reasons is "bad".
My reasons are simple: I want to speed up polling under Linux. My
benchmarks are needed so I can evaluate how my internal changes
improve things, as well as comparing with poll2(2) which I believe has
certain benefits (and if I'm wrong, I'll still post the benchmarks,
once they're all done).
> If you really are doing the right benchmarks and those indicate that
> your new approach is promising, then I say you should go for it.
I sent my code: any suggestions for further optimisations? I'll roll
in application code which actually processes the results from each of
the three syscalls (to give a true picture of total load) once I've
finished the internal optimisations.
> P.S. For sparse bitvectors the select code should probably scan the
> in, out and ex set separately, and use "find_first_zero" like calls.
You mean the kernel code, right? Unfortunately, how do you know that
the bitvectors are sparse? Apply some kind of test (analyse the
fd_sets) to find out? You'd want to be careful that the test doesn't
take too much time.
I suppose you could just do "find_first_bit" to skip to the start of
the set bits, and from then on just continue as normal (rather than
using "find_next_bit"). That would definately help in those cases
where you are checking for activity on descriptors 900-1000.
Right now it would be murder to separate the in, out and ex scans, as
it would mean you have to call the indirect poll functions 3 times as
often as now. That will make select(2) three times slower.
Once the "poll_events" field is available, this triple scanning won't
be so painful. Although you would end up skipping through memory three
times.
Hm. Perhaps a simple and cheap alternative is to use find_first_bit on
each fd_set, then take the minimum index of the three, and start
normal scanning from there. Of course, it won't help in the case where
you are selecting on descriptors 100,200,300,400,500,600,700,800,900
or some other such sparse set.
Regards,
Richard....