Re: WQ_UNBOUND workqueue warnings from multiple drivers
From: Sagi Grimberg
Date: Sun Apr 07 2024 - 16:08:34 EST
On 03/04/2024 2:50, Kamaljit Singh wrote:
Sagi, Chaitanya,
Sorry for the delay, found your replies in the junk folder :(
Was the test you were running read-heavy?
No, most of the failing fio tests were doing heavy writes. All were with 8 Controllers and 32 NS each. io-specs are below.
[1] bs=16k, iodepth=16, rwmixread=0, numjobs=16
Failed in ~1 min
Some others were:
[2] bs=8k, iodepth=16, rwmixread=5, numjobs=16
[3] bs=8k, iodepth=16, rwmixread=50, numjobs=16
Interesting, that is the opposite of what I would suspect (I thought that
the workload would be read-only or read-mostly).
Does this happen with a 90-%100% read workload?
If we look at nvme_tcp_io_work() it is essentially looping
doing send() and recv() and every iteration checks if a 1ms
deadline elapsed. The fact that it happens on a 100% write
workload leads me to conclude that the only way this can
happen if sending a single 16K request to a controller on its
own takes more than 10ms, which is unexpected...
Question, are you working with a Linux controller? what
is the ctrl ioccsz?