On 3/11/23 1:56 PM, Pavel Begunkov wrote:
On 3/10/23 20:38, Jens Axboe wrote:
On 3/10/23 1:11 PM, Breno Leitao wrote:
Right now io_wq allocates one io_wqe per NUMA node. As io_wq is now
bound to a task, the task basically uses only the NUMA local io_wqe, and
almost never changes NUMA nodes, thus, the other wqes are mostly
unused.
What if the task gets migrated to a different node? Unless the task
is pinned to a node/cpumask that is local to that node, it will move
around freely.
In which case we're screwed anyway and not only for the slow io-wq
path but also with the hot path as rings and all io_uring ctx and
requests won't be migrated locally.
Oh agree, not saying it's ideal, but it can happen.
What if you deliberately use io-wq to offload work and you set it
to another mask? That one I supposed we could handle by allocating
based on the set mask. Two nodes might be more difficult...
For most things this won't really matter as io-wq is a slow path
for that, but there might very well be cases that deliberately
offload.
It's also curious whether io-wq workers will get migrated
automatically as they are a part of the thread group.
They certainly will, unless affinitized otherwise.