Re: [PATCH 1/2] usbip: Skip DMA mapping and unmapping for urb at vhci

From: Suwan Kim
Date: Mon Jul 01 2019 - 05:29:42 EST


On Fri, Jun 28, 2019 at 06:11:54PM -0600, shuah wrote:
> Hi Suwan,
>
> On 6/21/19 11:45 AM, Suwan Kim wrote:
> > vhci doesnât do dma for remote device. Actually, the real dma
> > operation is done by network card driver. So, vhci doesnât use and
> > need dma address of transfer buffer of urb.
> >
> > When vhci supports SG, it is useful to use native SG list instead
> > of mapped SG list because dma mapping fnuction can adjust the
> > number of SG list that is urb->num_mapped_sgs.
> >
> > But hcd provides dma mapping and unmapping function by defualt.
>
> Typo "defualt"
>
> > Moreover, it causes unnecessary dma mapping and unmapping which
> > will be done again at the NIC driver and it wastes CPU cycles.
> > So, implement map_urb_for_dma and unmap_urb_for_dma function for
> > vhci in order to skip the dma mapping and unmapping procedure.
> >
>
> How did you verify that unnecessary dma map/unmap are happening?
> How many CPU cycles did you manage to reduce with this change?

Dma mapping/unmapping is not required for vhci because vhci passes
the virtual address of the buffer to the network stack without
passing the dma address of the buffer. Network stack receive the
virtual address of the buffer from vhci and later, network card
driver performs dma mapping for the buffer. So, as far as I know,
dma address of the buffer is not used for vhci and virtual address
is only used by vhci.

I used ftrace to measure a duration of usb_hcd_map_urb_for_dma().
As a result, usb_hcd_map_urb_for_dma() took a duration of about
0.14us out of about 10us which is the duration of usb_hcd_submit_urb().
However, this figure is the dma mapping measurement value for
physically contiguous buffers when vhci does not support SG, and
if vhci supports SG, more CPU cycles will be consumed for SG dma
mapping.

I think that the important point is dma mapping/unmapping is
unnecessary at vhci. So we can skip dma mapping/unmapping and save
the CPU cycles (even if it is small). This is an opportunity to
reduce the end-to-end latency of usbip and improve the performance.

Regards

Suwan Kim