On Thu, Mar 31, 2016 at 09:39:25PM -0600, Jens Axboe wrote:
On 03/31/2016 09:29 PM, Jens Axboe wrote:
I can't seem to reproduce this at all. On an nvme device, I get a
fairly steady 60K/sec file creation rate, and we're nowhere near
being IO bound. So the throttling has no effect at all.
That's too slow to show the stalls - your likely concurrency bound
in allocation by the default AG count (4) from mkfs. Use mkfs.xfs -d
agcount=32 so that every thread works in it's own AG.
That's the key, with that I get 300-400K ops/sec instead. I'll run some
testing with this tomorrow and see what I can find, it did one full run
now and I didn't see any issues, but I need to run it at various
settings and see if I can find the issue.
No stalls seen, I get the same performance with it disabled and with
it enabled, at both default settings, and lower ones
(wb_percent=20). Looking at iostat, we don't drive a lot of depth,
so it makes sense, even with the throttling we're doing essentially
the same amount of IO.
Try appending numa=fake=4 to your guest's kernel command line.
(that's what I'm using)
What does 'nr_requests' say for your virtio_blk device? Looks like
virtio_blk has a queue_depth setting, but it's not set by default,
and then it uses the free entries in the ring. But I don't know what
that is...
$ cat /sys/block/vdc/queue/nr_requests
128
I'll try the "don't throttle REQ_META" patch, but this seems like a
fragile way to solve this problem - it shuts up the messenger, but
doesn't solve the problem for any other subsystem that might have a
similer issue. e.g. next we're going to have to make sure direct IO
(which is also REQ_WRITE dispatch) does not get throttled, and so
on....
It seems to me that the right thing to do here is add a separate
classification flag for IO that can be throttled. e.g. as
REQ_WRITEBACK and only background writeback work sets this flag.
That would ensure that when the IO is being dispatched from other
sources (e.g. fsync, sync_file_range(), direct IO, filesystem
metadata, etc) it is clear that it is not a target for throttling.
This would also allow us to easily switch off throttling if
writeback is occurring for memory reclaim reasons, and so on.
Throttling policy decisions belong above the block layer, even
though the throttle mechanism itself is in the block layer.