Re: [PATCH] io_uring: One wqe per wq

From: Pavel Begunkov
Date: Sun Mar 12 2023 - 23:57:47 EST


On 3/11/23 22:13, Jens Axboe wrote:
On 3/11/23 1:56 PM, Pavel Begunkov wrote:
On 3/10/23 20:38, Jens Axboe wrote:
On 3/10/23 1:11 PM, Breno Leitao wrote:
Right now io_wq allocates one io_wqe per NUMA node.  As io_wq is now
bound to a task, the task basically uses only the NUMA local io_wqe, and
almost never changes NUMA nodes, thus, the other wqes are mostly
unused.

What if the task gets migrated to a different node? Unless the task
is pinned to a node/cpumask that is local to that node, it will move
around freely.

In which case we're screwed anyway and not only for the slow io-wq
path but also with the hot path as rings and all io_uring ctx and
requests won't be migrated locally.

Oh agree, not saying it's ideal, but it can happen.

What if you deliberately use io-wq to offload work and you set it
to another mask? That one I supposed we could handle by allocating
based on the set mask. Two nodes might be more difficult...

For most things this won't really matter as io-wq is a slow path
for that, but there might very well be cases that deliberately
offload.

It's not created for that, there is no fine control by the user.
If the user set affinity solely to another node, then it will
be quite bad for perf, if the mask covers multiple nodes, it'll
go to the current node. Do you have plans on io-wq across
numa nodes?


It's also curious whether io-wq workers will get migrated
automatically as they are a part of the thread group.

They certainly will, unless affinitized otherwise.

--
Pavel Begunkov