Re: [Xen-devel] [RFC 21/23] net/xen-netback: Make it running on 64KB page granularity

From: Wei Liu
Date: Fri May 15 2015 - 11:32:07 EST


On Fri, May 15, 2015 at 01:35:42PM +0100, Julien Grall wrote:
> Hi Wei,
>
> Thanks you for the review.
>
> On 15/05/15 03:35, Wei Liu wrote:
> > On Thu, May 14, 2015 at 06:01:01PM +0100, Julien Grall wrote:
> >> The PV network protocol is using 4KB page granularity. The goal of this
> >> patch is to allow a Linux using 64KB page granularity working as a
> >> network backend on a non-modified Xen.
> >>
> >> It's only necessary to adapt the ring size and break skb data in small
> >> chunk of 4KB. The rest of the code is relying on the grant table code.
> >>
> >> Although only simple workload is working (dhcp request, ping). If I try
> >> to use wget in the guest, it will stall until a tcpdump is started on
> >> the vif interface in DOM0. I wasn't able to find why.
> >>
> >
> > I think in wget workload you're more likely to break down 64K pages to
> > 4K pages. Some of your calculation of mfn, offset might be wrong.
>
> If so, why tcpdump on the vif interface would make wget suddenly
> working? Does it make netback use a different path?

No, but if might make core network component behave differently, this is
only my suspicion.

Do you see malformed packets with tcpdump?

>
> >> I have not modified XEN_NETBK_RX_SLOTS_MAX because I wasn't sure what
> >> it's used for (I have limited knowledge on the network driver).
> >>
> >
> > This is the maximum slots a guest packet can use. AIUI the protocol
> > still works on 4K granularity (you break 64K page to a bunch of 4K
> > pages), you don't need to change this.
>
> 1 slot = 1 grant right? If so, XEN_NETBK_RX_SLOTS_MAX is based on the
> number of Linux page. So we would have to get the number for Xen page.
>

Yes, 1 slot = 1 grant. I see what you're up to now. Yes, you need to
change this constant to match underlying HV page.

> Although, I gave a try to multiple by XEN_PFN_PER_PAGE (4KB/64KB = 16)
> but it get stuck in the loop.
>

I don't follow. What is the new #define? Which loop does it get stuck?

> >> ---
> >> drivers/net/xen-netback/common.h | 7 ++++---
> >> drivers/net/xen-netback/netback.c | 27 ++++++++++++++-------------
> >> 2 files changed, 18 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> >> index 8a495b3..0eda6e9 100644
> >> --- a/drivers/net/xen-netback/common.h
> >> +++ b/drivers/net/xen-netback/common.h
> >> @@ -44,6 +44,7 @@
> >> #include <xen/interface/grant_table.h>
> >> #include <xen/grant_table.h>
> >> #include <xen/xenbus.h>
> >> +#include <xen/page.h>
> >> #include <linux/debugfs.h>
> >>
> >> typedef unsigned int pending_ring_idx_t;
> >> @@ -64,8 +65,8 @@ struct pending_tx_info {
> >> struct ubuf_info callback_struct;
> >> };
> >>
> >> -#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
> >> -#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
> >> +#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
> >> +#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
> >>
> >> struct xenvif_rx_meta {
> >> int id;
> >> @@ -80,7 +81,7 @@ struct xenvif_rx_meta {
> >> /* Discriminate from any valid pending_idx value. */
> >> #define INVALID_PENDING_IDX 0xFFFF
> >>
> >> -#define MAX_BUFFER_OFFSET PAGE_SIZE
> >> +#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
> >>
> >> #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
> >>
> >> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> >> index 9ae1d43..ea5ce84 100644
> >> --- a/drivers/net/xen-netback/netback.c
> >> +++ b/drivers/net/xen-netback/netback.c
> >> @@ -274,7 +274,7 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
> >> {
> >> struct gnttab_copy *copy_gop;
> >> struct xenvif_rx_meta *meta;
> >> - unsigned long bytes;
> >> + unsigned long bytes, off_grant;
> >> int gso_type = XEN_NETIF_GSO_TYPE_NONE;
> >>
> >> /* Data must not cross a page boundary. */
> >> @@ -295,7 +295,8 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
> >> if (npo->copy_off == MAX_BUFFER_OFFSET)
> >> meta = get_next_rx_buffer(queue, npo);
> >>
> >> - bytes = PAGE_SIZE - offset;
> >> + off_grant = offset & ~XEN_PAGE_MASK;
> >> + bytes = XEN_PAGE_SIZE - off_grant;
> >> if (bytes > size)
> >> bytes = size;
> >>
> >> @@ -314,9 +315,9 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
> >> } else {
> >> copy_gop->source.domid = DOMID_SELF;
> >> copy_gop->source.u.gmfn =
> >> - virt_to_mfn(page_address(page));
> >> + virt_to_mfn(page_address(page) + offset);
> >> }
> >> - copy_gop->source.offset = offset;
> >> + copy_gop->source.offset = off_grant;
> >>
> >> copy_gop->dest.domid = queue->vif->domid;
> >> copy_gop->dest.offset = npo->copy_off;
> >> @@ -747,7 +748,7 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
> >> first->size -= txp->size;
> >> slots++;
> >>
> >> - if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
> >> + if (unlikely((txp->offset + txp->size) > XEN_PAGE_SIZE)) {
> >> netdev_err(queue->vif->dev, "Cross page boundary, txp->offset: %x, size: %u\n",
> >> txp->offset, txp->size);
> >> xenvif_fatal_tx_err(queue->vif);
> >> @@ -1241,11 +1242,11 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
> >> }
> >>
> >> /* No crossing a page as the payload mustn't fragment. */
> >> - if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) {
> >> + if (unlikely((txreq.offset + txreq.size) > XEN_PAGE_SIZE)) {
> >> netdev_err(queue->vif->dev,
> >> "txreq.offset: %x, size: %u, end: %lu\n",
> >> txreq.offset, txreq.size,
> >> - (txreq.offset&~PAGE_MASK) + txreq.size);
> >> + (txreq.offset&~XEN_PAGE_MASK) + txreq.size);
> >> xenvif_fatal_tx_err(queue->vif);
> >> break;
> >> }
> >> @@ -1287,7 +1288,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
> >> virt_to_mfn(skb->data);
> >
> > You didn't change the calculation of MFN here. I think it returns the
> > MFN of the first 4K sub-page of that 64K page. Do I miss anything?
>
> There is no change required. On ARM virt_to_mfn is implemented with:
>
> pfn_to_mfn(virt_to_phys(v) >> XEN_PAGE_SHIFT)
>
> which will return a 4KB PFN (see patch #23).
>

OK. I missed that patch.

> >
> >> queue->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
> >> queue->tx_copy_ops[*copy_ops].dest.offset =
> >> - offset_in_page(skb->data);
> >> + offset_in_page(skb->data) & ~XEN_PAGE_MASK;
> >>
> >> queue->tx_copy_ops[*copy_ops].len = data_len;
> >> queue->tx_copy_ops[*copy_ops].flags = GNTCOPY_source_gref;
> >> @@ -1366,8 +1367,8 @@ static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *s
> >
> > This function is to coalesce frag_list to a new SKB. It's completely
> > fine to use the natural granularity of backend domain. The way you
> > modified it can lead to waste of memory, i.e. you only use first 4K of a
> > 64K page.
>
> Thanks for explaining. I wasn't sure how the function works so I change
> it for safety. I will redo the change.
>
> FWIW, I'm sure there is other place in netback where we waste memory
> with 64KB page granularity (such as grant table). I need to track them.
>
> Let me know if you have some place in mind where the memory usage can be
> improved.
>

I was about to say the mmap_pages array is an array of pages. But that
probably belongs to grant table driver.

Wei.

> >> return -ENOMEM;
> >> }
> >>
> >> - if (offset + PAGE_SIZE < skb->len)
> >> - len = PAGE_SIZE;
> >> + if (offset + XEN_PAGE_SIZE < skb->len)
> >> + len = XEN_PAGE_SIZE;
> >> else
> >> len = skb->len - offset;
> >> if (skb_copy_bits(skb, offset, page_address(page), len))
> >> @@ -1396,7 +1397,7 @@ static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *s
> >> /* Fill the skb with the new (local) frags. */
> >> memcpy(skb_shinfo(skb)->frags, frags, i * sizeof(skb_frag_t));
> >> skb_shinfo(skb)->nr_frags = i;
> >> - skb->truesize += i * PAGE_SIZE;
> >> + skb->truesize += i * XEN_PAGE_SIZE;
> >
> > The true size accounts for the actual memory occupied by this SKB. Since
> > the page is allocated with alloc_page, the granularity should be
> > PAGE_SIZE not XEN_PAGE_SIZE.
>
> Ok. I will replace with PAGE_SIZE.
>
> Regards,
>
> --
> Julien Grall
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/