Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB

From: Michael S. Tsirkin
Date: Mon Jan 14 2019 - 15:29:45 EST


On Mon, Jan 14, 2019 at 07:12:08PM +0000, Robin Murphy wrote:
> On 14/01/2019 18:20, Michael S. Tsirkin wrote:
> > On Mon, Jan 14, 2019 at 08:41:37PM +0800, Jason Wang wrote:
> > >
> > > On 2019/1/14 äå5:50, Christoph Hellwig wrote:
> > > > On Mon, Jan 14, 2019 at 05:41:56PM +0800, Jason Wang wrote:
> > > > > On 2019/1/11 äå5:15, Joerg Roedel wrote:
> > > > > > On Fri, Jan 11, 2019 at 11:29:31AM +0800, Jason Wang wrote:
> > > > > > > Just wonder if my understanding is correct IOMMU_PLATFORM must be set for
> > > > > > > all virtio devices under AMD-SEV guests?
> > > > > > Yes, that is correct. Emulated DMA can only happen on the SWIOTLB
> > > > > > aperture, because that memory is not encrypted. The guest bounces the
> > > > > > data then to its encrypted memory.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Joerg
> > > > >
> > > > > Thanks, have you tested vhost-net in this case. I suspect it may not work
> > > > Which brings me back to my pet pevee that we need to take actions
> > > > that virtio uses the proper dma mapping API by default with quirks
> > > > for legacy cases. The magic bypass it uses is just causing problems
> > > > over problems.
> > >
> > >
> > > Yes, I fully agree with you. This is probably an exact example of such
> > > problem.
> > >
> > > Thanks
> >
> > I don't think so - the issue is really that DMA API does not yet handle
> > the SEV case 100% correctly. I suspect passthrough devices would have
> > the same issue.
>
> Huh? Regardless of which virtio devices use it or not, the DMA API is
> handling the SEV case as correctly as it possibly can, by forcing everything
> through the unencrypted bounce buffer. If the segments being mapped are too
> big for that bounce buffer in the first place, there's nothing it can
> possibly do except fail, gracefully or otherwise.

It seems reasonable to be able to ask it what it's capabilities are
though.

> Now, in theory, yes, the real issue at hand is not unique to virtio-blk nor
> SEV - any driver whose device has a sufficiently large DMA segment size and
> who manages to get sufficient physically-contiguous memory could technically
> generate a scatterlist segment longer than SWIOTLB can handle. However, in
> practice that basically never happens, not least because very few drivers
> ever override the default 64K DMA segment limit. AFAICS nothing in
> drivers/virtio is calling dma_set_max_seg_size() or otherwise assigning any
> dma_parms to replace the defaults either, so the really interesting question
> here is how are these apparently-out-of-spec 256K segments getting generated
> at all?
>
> Robin.

I guess this is what you are looking for:

/* Host can optionally specify maximum segment size and number of
* segments. */
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SIZE_MAX,
struct virtio_blk_config, size_max, &v);
if (!err)
blk_queue_max_segment_size(q, v);
else
blk_queue_max_segment_size(q, -1U);

virtio isn't the only caller with a value >64K:

$ git grep -A1 blk_queue_max_segment_size
Documentation/block/biodoc.txt: blk_queue_max_segment_size(q, max_seg_size)
Documentation/block/biodoc.txt- Maximum size of a clustered segment, 64kB default.
--
block/blk-settings.c: * blk_queue_max_segment_size - set max segment size for blk_rq_map_sg
block/blk-settings.c- * @q: the request queue for the device
--
block/blk-settings.c:void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
block/blk-settings.c-{
--
block/blk-settings.c:EXPORT_SYMBOL(blk_queue_max_segment_size);
block/blk-settings.c-
--
drivers/block/mtip32xx/mtip32xx.c: blk_queue_max_segment_size(dd->queue, 0x400000);
drivers/block/mtip32xx/mtip32xx.c- blk_queue_io_min(dd->queue, 4096);
--
drivers/block/nbd.c: blk_queue_max_segment_size(disk->queue, UINT_MAX);
drivers/block/nbd.c- blk_queue_max_segments(disk->queue, USHRT_MAX);
--
drivers/block/ps3disk.c: blk_queue_max_segment_size(queue, dev->bounce_size);
drivers/block/ps3disk.c-
--
drivers/block/ps3vram.c: blk_queue_max_segment_size(queue, BLK_MAX_SEGMENT_SIZE);
drivers/block/ps3vram.c- blk_queue_max_hw_sectors(queue, BLK_SAFE_MAX_SECTORS);
--
drivers/block/rbd.c: blk_queue_max_segment_size(q, UINT_MAX);
drivers/block/rbd.c- blk_queue_io_min(q, objset_bytes);
--
drivers/block/sunvdc.c: blk_queue_max_segment_size(q, PAGE_SIZE);
drivers/block/sunvdc.c-
--
drivers/block/virtio_blk.c: blk_queue_max_segment_size(q, v);
drivers/block/virtio_blk.c- else
drivers/block/virtio_blk.c: blk_queue_max_segment_size(q, -1U);
drivers/block/virtio_blk.c-
--
drivers/block/xen-blkfront.c: blk_queue_max_segment_size(rq, PAGE_SIZE);
drivers/block/xen-blkfront.c-
--
drivers/cdrom/gdrom.c: blk_queue_max_segment_size(gd.gdrom_rq, 0x40000);
drivers/cdrom/gdrom.c- gd.disk->queue = gd.gdrom_rq;
--
drivers/memstick/core/ms_block.c: blk_queue_max_segment_size(msb->queue,
drivers/memstick/core/ms_block.c- MS_BLOCK_MAX_PAGES * msb->page_size);
--
drivers/memstick/core/mspro_block.c: blk_queue_max_segment_size(msb->queue,
drivers/memstick/core/mspro_block.c- MSPRO_BLOCK_MAX_PAGES * msb->page_size);
--
drivers/mmc/core/queue.c: blk_queue_max_segment_size(mq->queue, host->max_seg_size);
drivers/mmc/core/queue.c-
--
drivers/s390/block/dasd.c: blk_queue_max_segment_size(q, PAGE_SIZE);
drivers/s390/block/dasd.c- blk_queue_segment_boundary(q, PAGE_SIZE - 1);
--
drivers/scsi/be2iscsi/be_main.c: blk_queue_max_segment_size(sdev->request_queue, 65536);
drivers/scsi/be2iscsi/be_main.c- return 0;
--
drivers/scsi/scsi_debug.c: blk_queue_max_segment_size(sdp->request_queue, -1U);
drivers/scsi/scsi_debug.c- if (sdebug_no_uld)
--
drivers/scsi/scsi_lib.c: blk_queue_max_segment_size(q, dma_get_max_seg_size(dev));
drivers/scsi/scsi_lib.c-
--
drivers/scsi/ufs/ufshcd.c: blk_queue_max_segment_size(q, PRDT_DATA_BYTE_COUNT_MAX);
drivers/scsi/ufs/ufshcd.c-
--
include/linux/blkdev.h:extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
include/linux/blkdev.h-extern void blk_queue_max_discard_sectors(struct request_queue *q,
--
include/linux/mmc/host.h: unsigned int max_seg_size; /* see blk_queue_max_segment_size */
include/linux/mmc/host.h- unsigned short max_segs; /* see blk_queue_max_segments */


Some of these devices are probably not going to work well if
passed through to a SEV guest.

Going back to virtio, at some level virtio is like a stacking device so
it does not necessarily need a limit.

--
MST