devices will benefit from the batching so maybe the flag needs to be
inverted? BLK_MQ_F_DONT_BATCHING_SUBMISSION?
Actually I'd rather to not add any flag, and we may use some algorithm
(maybe EWMA or other intelligent stuff, or use hctx->dispatch_busy directly)
to figure out one dynamic batching number which is used to dequeue requests
from scheduler or sw queue.
Then just one single dispatch io code path is enough.