RE: [virtio-dev] Re: [PATCH v9 2/5] virtio-balloon: VIRTIO_BALLOON_F_BALLOON_CHUNKS

From: Wang, Wei W
Date: Wed Apr 26 2017 - 07:04:15 EST


Hi Michael, could you please give some feedback?

On Monday, April 17, 2017 11:35 AM, Wei Wang wrote:
> On 04/15/2017 05:38 AM, Michael S. Tsirkin wrote:
> > On Fri, Apr 14, 2017 at 04:37:52PM +0800, Wei Wang wrote:
> >> On 04/14/2017 12:34 AM, Michael S. Tsirkin wrote:
> >>> On Thu, Apr 13, 2017 at 05:35:05PM +0800, Wei Wang wrote:
> >>>
> >>> So we don't need the bitmap to talk to host, it is just a data
> >>> structure we chose to maintain lists of pages, right?
> >> Right. bitmap is the way to gather pages to chunk.
> >> It's only needed in the balloon page case.
> >> For the unused page case, we don't need it, since the free page
> >> blocks are already chunks.
> >>
> >>> OK as far as it goes but you need much better isolation for it.
> >>> Build a data structure with APIs such as _init, _cleanup, _add,
> >>> _clear, _find_first, _find_next.
> >>> Completely unrelated to pages, it just maintains bits.
> >>> Then use it here.
> >>>
> >>>
> >>>> static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> >>>> module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> >>>> MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); @@ -50,6
> >>>> +54,10 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> >>>> static struct vfsmount *balloon_mnt;
> >>>> #endif
> >>>> +/* Types of pages to chunk */
> >>>> +#define PAGE_CHUNK_TYPE_BALLOON 0
> >>>> +
> >>> Doesn't look like you are ever adding more types in this patchset.
> >>> Pls keep code simple, generalize it later.
> >>>
> >> "#define PAGE_CHUNK_TYPE_UNUSED 1" is added in another patch.
> > I would say add the extra code there too. Or maybe we can avoid adding
> > it altogether.
>
> I'm trying to have the two features( i.e. "balloon pages" and "unused pages")
> decoupled while trying to use common functions to deal with the commonalities.
> That's the reason to define the above macro.
> Without the macro, we will need to have separate functions, for example,
> instead of one "add_one_chunk()", we need to have
> add_one_balloon_page_chunk() and add_one_unused_page_chunk(), and some
> of the implementations will be kind of duplicate in the two functions.
> Probably we can add it when the second feature comes to the code.
>
> >
> >> Types of page to chunk are treated differently. Different types of
> >> page chunks are sent to the host via different protocols.
> >>
> >> 1) PAGE_CHUNK_TYPE_BALLOON: Ballooned (i.e. inflated/deflated) pages
> >> to chunk. For the ballooned type, it uses the basic chunk msg format:
> >>
> >> virtio_balloon_page_chunk_hdr +
> >> virtio_balloon_page_chunk * MAX_PAGE_CHUNKS
> >>
> >> 2) PAGE_CHUNK_TYPE_UNUSED: unused pages to chunk. It uses this miscq
> >> msg
> >> format:
> >> miscq_hdr +
> >> virtio_balloon_page_chunk_hdr +
> >> virtio_balloon_page_chunk * MAX_PAGE_CHUNKS
> >>
> >> The chunk msg is actually the payload of the miscq msg.
> >>
> >>
> > So just combine the two message formats and then it'll all be easier?
> >
>
> Yes, it'll be simple with only one msg format. But the problem I see here is that
> miscq hdr is something necessary for the "unused page"
> usage, but not needed by the "balloon page" usage. To be more precise, struct
> virtio_balloon_miscq_hdr {
> __le16 cmd;
> __le16 flags;
> };
> 'cmd' specifies the command from the miscq (I envision that miscq will be
> further used to handle other possible miscellaneous requests either from the
> host or to the host), so 'cmd' is necessary for the miscq. But the inflateq is
> exclusively used for inflating pages, so adding a command to it would be
> redundant and look a little bewildered there.
> 'flags': We currently use bit 0 of flags to indicate the completion ofa command,
> this is also useful in the "unused page" usage, and not needed by the "balloon
> page" usage.
> >>>> +#define MAX_PAGE_CHUNKS 4096
> >>> This is an order-4 allocation. I'd make it 4095 and then it's an
> >>> order-3 one.
> >> Sounds good, thanks.
> >> I think it would be better to make it 4090. Leave some space for the
> >> hdr as well.
> > And miscq hdr. In fact just let compiler do the math - something like:
> > (8 * PAGE_SIZE - sizeof(hdr)) / sizeof(chunk)
> Agree, thanks.
>
> >
> > I skimmed explanation of algorithms below but please make sure code
> > speaks for itself and add comments inline to document it.
> > Whenever you answered me inline this is where you want to try to make
> > code clearer and add comments.
> >
> > Also, pls find ways to abstract the data structure so we don't need to
> > deal with its internals all over the code.
> >
> >
> > ....
> >
> >>>> {
> >>>> struct scatterlist sg;
> >>>> + struct virtio_balloon_page_chunk_hdr *hdr;
> >>>> + void *buf;
> >>>> unsigned int len;
> >>>> - sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> >>>> + switch (type) {
> >>>> + case PAGE_CHUNK_TYPE_BALLOON:
> >>>> + hdr = vb->balloon_page_chunk_hdr;
> >>>> + len = 0;
> >>>> + break;
> >>>> + default:
> >>>> + dev_warn(&vb->vdev->dev, "%s: chunk %d of unknown
> pages\n",
> >>>> + __func__, type);
> >>>> + return;
> >>>> + }
> >>>> - /* We should always be able to add one buffer to an empty queue. */
> >>>> - virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
> >>>> - virtqueue_kick(vq);
> >>>> + buf = (void *)hdr - len;
> >>> Moving back to before the header? How can this make sense?
> >>> It works fine since len is 0, so just buf = hdr.
> >>>
> >> For the unused page chunk case, it follows its own protocol:
> >> miscq_hdr + payload(chunk msg).
> >> "buf = (void *)hdr - len" moves the buf pointer to the miscq_hdr,
> >> to send the entire miscq msg.
> > Well just pass the correct pointer in.
> >
> OK. The miscq msg is
> {
> miscq_hdr;
> chunk_msg;
> }
>
> We can probably change the code like this:
>
> #define CHUNK_TO_MISCQ_MSG(chunk) (chunk - sizeof(struct
> virtio_balloon_miscq_hdr))
>
> switch (type) {
> case PAGE_CHUNK_TYPE_BALLOON:
> msg_buf = vb->balloon_page_chunk_hdr;
> msg_len = sizeof(struct virtio_balloon_page_chunk_hdr) +
> nr_chunks * sizeof(struct virtio_balloon_page_chunk_entry);
> break;
> case PAGE_CHUNK_TYPE_UNUSED:
> msg_buf = CHUNK_TO_MISCQ_MSG(vb->unused_page_chunk_hdr);
> msg_len = sizeof(struct virtio_balloon_miscq_hdr) + sizeof(struct
> virtio_balloon_page_chunk_hdr) +
> nr_chunks * sizeof(struct virtio_balloon_page_chunk_entry);
> break;
> default:
> dev_warn(&vb->vdev->dev, "%s: chunk %d of unknown pages\n",
> __func__, type);
> return;
> }
>
>
>
> >> Please check the patch for implementing the unused page chunk, it
> >> will be clear. If necessary, I can put "buf = (void *)hdr - len" from
> >> that patch.
> > Exactly. And all this pointer math is very messy. Please look for ways
> > to clean it. It's generally easy to fill structures:
> >
> > struct foo *foo = kmalloc(..., sizeof(*foo) + n * sizeof(foo->a[0]));
> > for (i = 0; i < n; ++i)
> > foo->a[i] = b;
> >
> > this is the kind of code that's easy to understand and it's obvious
> > there are no overflows and no info leaks here.
> >
> OK, will take your suggestion:
>
> struct virtio_balloon_page_chunk {
> struct virtio_balloon_page_chunk_hdr hdr;
> struct virtio_balloon_page_chunk_entry entries[]; };
>
>
> >>>> + len += sizeof(struct virtio_balloon_page_chunk_hdr);
> >>>> + len += hdr->chunks * sizeof(struct virtio_balloon_page_chunk);
> >>>> + sg_init_table(&sg, 1);
> >>>> + sg_set_buf(&sg, buf, len);
> >>>> + if (!virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL)) {
> >>>> + virtqueue_kick(vq);
> >>>> + if (busy_wait)
> >>>> + while (!virtqueue_get_buf(vq, &len) &&
> >>>> + !virtqueue_is_broken(vq))
> >>>> + cpu_relax();
> >>>> + else
> >>>> + wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> >>>> + hdr->chunks = 0;
> >>> Why zero it here after device used it? Better to zero before use.
> >> hdr->chunks tells the host how many chunks are there in the payload.
> >> After the device use it, it is ready to zero it.
> > It's rather confusing. Try to pass # of chunks around in some other
> > way.
>
> Not sure if this was explained clearly - we just let the chunk msg hdr indicates
> the # of chunks in the payload. I think this should be a pretty normal usage, like
> the network UDP hdr, which uses a length field to indicate the packet length.
>
> >>>> + }
> >>>> +}
> >>>> +
> >>>> +static void add_one_chunk(struct virtio_balloon *vb, struct virtqueue *vq,
> >>>> + int type, u64 base, u64 size)
> >>> what are the units here? Looks like it's in 4kbyte units?
> >> what is the "unit" you referred to?
> >> This is the function to add one chunk, base pfn and size of the chunk
> >> are supplied to the function.
> >>
> > Are both size and base in bytes then?
> > But you do not send them to host as is, you shift them for some reason
> > before sending them to host.
> >
> Not in bytes actually. base is a base pfn, which is the starting address of the
> continuous pfns. Size is the chunk size, which is the number of continuous pfns.
>
> They are shifted based on the chunk format we agreed before:
>
> --------------------------------------------------------
> | Base (52 bit) | Rsvd (12 bit) |
> --------------------------------------------------------
> --------------------------------------------------------
> | Size (52 bit) | Rsvd (12 bit) |
> --------------------------------------------------------
>
>
> Here, the pfn will be the balloon page pfn (4KB).In this way, the host doesn't
> need to know PAGE_SIZE of the guest.
>
>
>
> >>>> + if (zero >= end)
> >>>> + chunk_size = end - one;
> >>>> + else
> >>>> + chunk_size = zero - one;
> >>>> +
> >>>> + if (chunk_size)
> >>>> + add_one_chunk(vb, vq,
> PAGE_CHUNK_TYPE_BALLOON,
> >>>> + pfn_start + one, chunk_size);
> >>> Still not so what does a bit refer to? page or 4kbytes?
> >>> I think it should be a page.
> >> A bit in the bitmap corresponds to a pfn of a balloon page(4KB).
> > That's a waste on systems with large page sizes, and it does not look
> > like you handle that case correctly.
>
> OK, I will change the bitmap to be PAGE_SIZE based here, instead of
> BALLOON_PAGE_SIZE based. When convert them into chunks, making it based
> on BALLOON_PAGE_SIZE.
>
>
> Best,
> Wei
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@xxxxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: virtio-dev-help@xxxxxxxxxxxxxxxxxxxx