Re: [PATCH v4 2/2] io_uring: Add support for napi_busy_poll

From: Olivier Langlois
Date: Tue Mar 01 2022 - 15:06:56 EST


On Wed, 2022-03-02 at 02:31 +0800, Hao Xu wrote:
>
> > +       ne = kmalloc(sizeof(*ne), GFP_NOWAIT);
> > +       if (!ne)
> > +               goto out;
>
> IMHO, we need to handle -ENOMEM here, I cut off the error handling
> when
>
> I did the quick coding. Sorry for misleading.

If you are correct, I would be shocked about this.

I did return in my 'Linux Device Drivers' book and nowhere it is
mentionned that the kmalloc() can return something else than a pointer

No mention at all about the return value

in man page:
https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html
API doc:

https://www.kernel.org/doc/html/latest/core-api/mm-api.html?highlight=kmalloc#c.kmalloc

header file:
https://elixir.bootlin.com/linux/latest/source/include/linux/slab.h#L522

I did browse into the kmalloc code. There is a lot of paths to cover
but from preliminary reading, it pretty much seems that kmalloc only
returns a valid pointer or NULL...

/**
* kmem_cache_alloc - Allocate an object
* @cachep: The cache to allocate from.
* @flags: See kmalloc().
*
* Allocate an object from this cache. The flags are only relevant
* if the cache has no available objects.
*
* Return: pointer to the new object or %NULL in case of error
*/

/**
* __do_kmalloc - allocate memory
* @size: how many bytes of memory are required.
* @flags: the type of memory to allocate (see kmalloc).
* @caller: function caller for debug tracking of the caller
*
* Return: pointer to the allocated memory or %NULL in case of error
*/

I'll need someone else to confirm about possible kmalloc() return
values with perhaps an example

I am a bit skeptic that something special needs to be done here...

Or perhaps you are suggesting that io_add_napi() returns an error code
when allocation fails.

as done here:
https://elixir.bootlin.com/linux/latest/source/arch/alpha/kernel/core_marvel.c#L867

If that is what you suggest, what would this info do for the caller?

IMHO, it wouldn't help in any way...
>
> >
> > @@ -7519,7 +7633,11 @@ static int __io_sq_thread(struct io_ring_ctx
> > *ctx, bool cap_entries)
> >                     !(ctx->flags & IORING_SETUP_R_DISABLED))
> >                         ret = io_submit_sqes(ctx, to_submit);
> >                 mutex_unlock(&ctx->uring_lock);
> > -
> > +#ifdef CONFIG_NET_RX_BUSY_POLL
> > +               if (!list_empty(&ctx->napi_list) &&
> > +                   io_napi_busy_loop(&ctx->napi_list))
>
> I'm afraid we may need lock for sqpoll too, since io_add_napi() could
> be
> in iowq context.
>
> I'll take a look at the lock stuff of this patch tomorrow, too late
> now
> in my timezone.

Ok, please do. I'm not a big user of io workers. I may have omitted to
consider this possibility.

If that is the case, I think that this would be very easy to fix by
locking the spinlock while __io_sq_thread() is using napi_list.
>
> How about:
>
> if (list is singular) {
>
>      do something;
>
>      return;
>
> }
>
> while (!io_busy_loop_end() && io_napi_busy_loop())
>
>      ;
>

is there a concern with the current code?
What would be the benefit of your suggestion over current code?

To me, it seems that if io_blocking_napi_busy_loop() is called, a
reasonable expectation would be that some busy looping is done or else
you could return the function without doing anything which would, IMHO,
be misleading.

By definition, napi_busy_loop() is not blocking and if you desire the
device to be in busy poll mode, you need to do it once in a while or
else, after a certain time, the device will return back to its
interrupt mode.

IOW, io_blocking_napi_busy_loop() follows the same logic used by
napi_busy_loop() that does not call loop_end() before having perform 1
loop iteration.

> Btw, start_time seems not used in singular branch.

I know. This is why it is conditionally initialized.

Greetings,