Thanks all for the comments and apologies for the delay in replying.
Stan and Joe I’ve addressed some of the common concerns below.
On Thu, Aug 29, 2024 at 3:40 AM Joe Damato <jdamato@xxxxxxxxxx> wrote:
On Wed, Aug 28, 2024 at 06:10:11PM +0000, Naman Gulati wrote:
NAPI busypolling in ep_busy_loop loops on napi_poll and checks for new
epoll events after every napi poll. Checking just for epoll events in a
tight loop in the kernel context delivers latency gains to applications
that are not interested in napi busypolling with epoll.
This patch adds an option to loop just for new events inside
ep_busy_loop, guarded by the EPIOCSPARAMS ioctl that controls epoll napi
busypolling.
This makes an API change, so I think that linux-api@xxxxxxxxxxxxxxx
needs to be CC'd ?
A comparison with neper tcp_rr shows that busylooping for events in
epoll_wait boosted throughput by ~3-7% and reduced median latency by
~10%.
To demonstrate the latency and throughput improvements, a comparison was
made of neper tcp_rr running with:
1. (baseline) No busylooping
Is there NAPI-based steering to threads via SO_INCOMING_NAPI_ID in
this case? More details, please, on locality. If there is no
NAPI-based flow steering in this case, perhaps the improvements you
are seeing are a result of both syscall overhead avoidance and data
locality?
The benchmarks were run with no NAPI steering.
Regarding syscall overhead, I reproduced the above experiment with
mitigations=off
and found similar results as above. Pointing to the fact that the
above gains are
materialized from more than just avoiding syscall overhead.