Heiko> Originaly, we had the same idea as you mentioned, that it
Heiko> would be better to do this in the higher levels. The point
Heiko> is that we can't see so far any simple posibility how this
Heiko> can done in the OpenIB stack, the TCP/IP network layer or
Heiko> somewhere in the Linux kernel.
Heiko> For example: For IPoIB we get the best throughput when we
Heiko> do the CQ callbacks on different CPUs and not to stay on
Heiko> the same CPU.
So why not do it in IPoIB then? This approach is not optimal
globally. For example, uverbs event dispatch is just going to queue
an event and wake up the process waiting for events, and doing this on
some random CPU not related to the where the process will run is
clearly the worst possible way to dispatch the event.
Heiko> In other papers and slides (see [1]) you can see similar
Heiko> approaches.
Heiko> [1]: Speeding up Networking, Van Jacobson and Bob
Heiko> Felderman,
Heiko> http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf
I think you've misunderstood this paper. It's about maximizing CPU
locality and pushing processing directly into the consumer. In the
context of slide 9, what you've done is sort of like adding another
control loop inside the kernel, since you dispatch from interrupt
handler to driver thread to final consumer. So I would argue that
your approach is exactly the opposite of what VJ is advocating.