RE: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

From: Michael Kelley
Date: Tue Aug 31 2021 - 13:16:41 EST


From: Christoph Hellwig <hch@xxxxxx> Sent: Monday, August 30, 2021 5:01 AM
>
> Sorry for the delayed answer, but I look at the vmap_pfn usage in the
> previous version and tried to come up with a better version. This
> mostly untested branch:
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hyperv-vmap
>
> get us there for swiotlb and the channel infrastructure I've started
> looking at the network driver and didn't get anywhere due to other work.
>
> As far as I can tell the network driver does gigantic multi-megabyte
> vmalloc allocation for the send and receive buffers, which are then
> passed to the hardware, but always copied to/from when interacting
> with the networking stack. Did I see that right? Are these big
> buffers actually required unlike the normal buffer management schemes
> in other Linux network drivers?
>
> If so I suspect the best way to allocate them is by not using vmalloc
> but just discontiguous pages, and then use kmap_local_pfn where the
> PFN includes the share_gpa offset when actually copying from/to the
> skbs.

As a quick overview, I think there are four places where the
shared_gpa_boundary must be applied to adjust the guest physical
address that is used. Each requires mapping a corresponding
virtual address range. Here are the four places:

1) The so-called "monitor pages" that are a core communication
mechanism between the guest and Hyper-V. These are two single
pages, and the mapping is handled by calling memremap() for
each of the two pages. See Patch 7 of Tianyu's series.

2) The VMbus channel ring buffers. You have proposed using
your new vmap_phys_range() helper, but I don't think that works
here. More details below.

3) The network driver send and receive buffers. vmap_phys_range()
should work here.

4) The swiotlb memory used for bounce buffers. vmap_phys_range()
should work here as well.

Case #2 above does unusual mapping. The ring buffer consists of a ring
buffer header page, followed by one or more pages that are the actual
ring buffer. The pages making up the actual ring buffer are mapped
twice in succession. For example, if the ring buffer has 4 pages
(one header page and three ring buffer pages), the contiguous
virtual mapping must cover these seven pages: 0, 1, 2, 3, 1, 2, 3.
The duplicate contiguous mapping allows the code that is reading
or writing the actual ring buffer to not be concerned about wrap-around
because writing off the end of the ring buffer is automatically
wrapped-around by the mapping. The amount of data read or
written in one batch never exceeds the size of the ring buffer, and
after a batch is read or written, the read or write indices are adjusted
to put them back into the range of the first mapping of the actual
ring buffer pages. So there's method to the madness, and the
technique works pretty well. But this kind of mapping is not
amenable to using vmap_phys_range().

Michael