RE: [RFC PATCH 00/10] Multi-queue support for xen-block driver

From: Felipe Franciosi
Date: Wed Feb 18 2015 - 13:22:23 EST

> -----Original Message-----
> From: Bob Liu [mailto:bob.liu@xxxxxxxxxx]
> Sent: 15 February 2015 08:19
> To: xen-devel@xxxxxxxxxxxxx
> Cc: David Vrabel; linux-kernel@xxxxxxxxxxxxxxx; Roger Pau Monne;
> konrad.wilk@xxxxxxxxxx; Felipe Franciosi; axboe@xxxxxx; hch@xxxxxxxxxxxxx;
> avanzini.arianna@xxxxxxxxx; Bob Liu
> Subject: [RFC PATCH 00/10] Multi-queue support for xen-block driver
> This patchset convert the Xen PV block driver to the multi-queue block layer API
> by sharing and using multiple I/O rings between the frontend and backend.
> History:
> It's based on the result of Arianna's internship for GNOME's Outreach Program
> for Women, in which she was mentored by Konrad Rzeszutek Wilk. I also
> worked on this patchset with her at that time, and now fully take over this task.
> I've got her authorization to "change authorship or SoB to the patches as you
> like."
> A few words on block multi-queue layer:
> Multi-queue block layer improved block scalability a lot by split single request
> queue to per-processor software queues and hardware dispatch queues. The
> linux blk-mq API will handle software queues, while specific block driver must
> deal with hardware queues.

IIUC, the main motivation around the blk-mq work was around locking issues on a block device's request queue when accessed concurrently from different NUMA nodes. I believe we are not stressing enough on the main benefit of taking such approach on Xen.

Many modern storage systems (e.g. NVMe devices) will respond much better (especially when it comes to IOPS) to a high number of outstanding requests. That can be achieved by having a single thread sustaining a high IO depth _and/or_ several different threads issuing requests at the same time. The former approach is often limited by CPU capacity; that is, we can suffer from only being able to handle so many interrupts being delivered to the (v)CPU that the single thread is running on (also simply observable by 'top' showing the thread smoking at 100%). The latter approach is more flexible, given that many threads can run over several different (v)CPUs. I have a lot of data around this topic and am happy to share if people are interested.

We can therefore use the multi-queue block layer in a guest to have more than one request queue associated with block front. These can be mapped over several rings to the backend, making it very easy for us to run multiple threads on the backend for a single virtual disk. I believe this is why Bob is seeing massive improvements when running 'fio' in a guest with an increased number of jobs.

In my opinion, this motivation should be highlighted behind the blk-mq adoption by Xen.


> The xen/block implementation:
> 1) Convert to blk-mq api with only one hardware queue.
> 2) Use more rings to act as multi hardware queues.
> 3) Negotiate number of hardware queues, the same as xen-net driver. The
> backend notify "multi-queue-max-queues" to frontend, then the front write
> back final number to "multi-queue-num-queues".
> Test result:
> fio's IOmeter emulation on a 16 cpus domU with a null_blk device, hardware
> queue number was 16.
> nr_fio_jobs IOPS(before) IOPS(after) Diff
> 1 57k 58k 0%
> 4 95k 201k +210%
> 8 89k 372k +410%
> 16 68k 284k +410%
> 32 65k 196k +300%
> 64 63k 183k +290%
> More results are coming, there was also big improvement on both write-IOPS
> and latency.
> Any comments or suggestions are welcome.
> Thank you,
> -Bob Liu
> Bob Liu (10):
> xen/blkfront: convert to blk-mq API
> xen/blkfront: drop legacy block layer support
> xen/blkfront: reorg info->io_lock after using blk-mq API
> xen/blkfront: separate ring information to an new struct
> xen/blkback: separate ring information out of struct xen_blkif
> xen/blkfront: pseudo support for multi hardware queues
> xen/blkback: pseudo support for multi hardware queues
> xen/blkfront: negotiate hardware queue number with backend
> xen/blkback: get hardware queue number from blkfront
> xen/blkfront: use work queue to fast blkif interrupt return
> drivers/block/xen-blkback/blkback.c | 370 ++++++++------- drivers/block/xen-
> blkback/common.h | 54 ++- drivers/block/xen-blkback/xenbus.c | 415
> +++++++++++------
> drivers/block/xen-blkfront.c | 894 +++++++++++++++++++++---------------
> 4 files changed, 1018 insertions(+), 715 deletions(-)
> --

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at