Re: WQ_UNBOUND workqueue warnings from multiple drivers

From: Sagi Grimberg
Date: Sun Apr 07 2024 - 16:08:34 EST




On 03/04/2024 2:50, Kamaljit Singh wrote:
Sagi, Chaitanya,
Sorry for the delay, found your replies in the junk folder :(
 Was the test you were running read-heavy?
No, most of the failing fio tests were doing heavy writes. All were with 8 Controllers and 32 NS each. io-specs are below.

[1] bs=16k, iodepth=16, rwmixread=0, numjobs=16
Failed in ~1 min

Some others were:
[2] bs=8k, iodepth=16, rwmixread=5, numjobs=16
[3] bs=8k, iodepth=16, rwmixread=50, numjobs=16

Interesting, that is the opposite of what I would suspect (I thought that
the workload would be read-only or read-mostly).

Does this happen with a 90-%100% read workload?

If we look at nvme_tcp_io_work() it is essentially looping
doing send() and recv() and every iteration checks if a 1ms
deadline elapsed. The fact that it happens on a 100% write
workload leads me to conclude that the only way this can
happen if sending a single 16K request to a controller on its
own takes more than 10ms, which is unexpected...

Question, are you working with a Linux controller? what
is the ctrl ioccsz?