Re: [RFC net-next 5/5] eventpoll: Control irq suspension for prefer_busy_poll

From: Stanislav Fomichev
Date: Mon Aug 12 2024 - 16:21:03 EST


On 08/12, Joe Damato wrote:
> From: Martin Karsten <mkarsten@xxxxxxxxxxxx>
>
> When events are reported to userland and prefer_busy_poll is set, irqs are
> temporarily suspended using napi_suspend_irqs.
>
> If no events are found and ep_poll would go to sleep, irq suspension is
> cancelled using napi_resume_irqs.
>
> Signed-off-by: Martin Karsten <mkarsten@xxxxxxxxxxxx>
> Co-developed-by: Joe Damato <jdamato@xxxxxxxxxx>
> Signed-off-by: Joe Damato <jdamato@xxxxxxxxxx>
> Tested-by: Joe Damato <jdamato@xxxxxxxxxx>
> Tested-by: Martin Karsten <mkarsten@xxxxxxxxxxxx>
> ---
> fs/eventpoll.c | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index cc47f72005ed..d74b5b9c1f51 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -457,6 +457,8 @@ static bool ep_busy_loop(struct eventpoll *ep, int nonblock)
> * it back in when we have moved a socket with a valid NAPI
> * ID onto the ready list.
> */
> + if (prefer_busy_poll)
> + napi_resume_irqs(napi_id);
> ep->napi_id = 0;
> return false;
> }
> @@ -540,6 +542,14 @@ static long ep_eventpoll_bp_ioctl(struct file *file, unsigned int cmd,
> }
> }
>
> +static void ep_suspend_napi_irqs(struct eventpoll *ep)
> +{
> + unsigned int napi_id = READ_ONCE(ep->napi_id);
> +
> + if (napi_id >= MIN_NAPI_ID && READ_ONCE(ep->prefer_busy_poll))
> + napi_suspend_irqs(napi_id);
> +}
> +
> #else
>
> static inline bool ep_busy_loop(struct eventpoll *ep, int nonblock)
> @@ -557,6 +567,10 @@ static long ep_eventpoll_bp_ioctl(struct file *file, unsigned int cmd,
> return -EOPNOTSUPP;
> }
>
> +static void ep_suspend_napi_irqs(struct eventpoll *ep)
> +{
> +}
> +
> #endif /* CONFIG_NET_RX_BUSY_POLL */
>
> /*
> @@ -788,6 +802,10 @@ static bool ep_refcount_dec_and_test(struct eventpoll *ep)
>
> static void ep_free(struct eventpoll *ep)
> {
> + unsigned int napi_id = READ_ONCE(ep->napi_id);
> +
> + if (napi_id >= MIN_NAPI_ID && READ_ONCE(ep->prefer_busy_poll))
> + napi_resume_irqs(napi_id);
> mutex_destroy(&ep->mtx);
> free_uid(ep->user);
> wakeup_source_unregister(ep->ws);
> @@ -2005,8 +2023,10 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
> * trying again in search of more luck.
> */
> res = ep_send_events(ep, events, maxevents);
> - if (res)
> + if (res) {
> + ep_suspend_napi_irqs(ep);

Aren't we already doing defer in the busy_poll_stop? (or in napi_poll
when it's complete/done). Why do we need another rearming here?