RE: [PATCH RESEND] misc/vmw_vmci: bail out earlier on too big queue allocation

From: Jorgen Hansen
Date: Mon Dec 07 2020 - 09:42:02 EST


From: Stefano Garzarella <sgarzare@xxxxxxxxxx>
Sent: Wednesday, December 2, 2020 2:06 AM
>
> On Tue, Dec 1, 2020 at 9:21 PM Arnd Bergmann <arnd@xxxxxxxxxx> wrote:
> >
> > On Mon, Nov 23, 2020 at 10:01 PM Norbert Slusarek <nslusarek@xxxxxxx>
> wrote:
> > >
> > > From: Norbert Slusarek <nslusarek@xxxxxxx>
> > > Date: Mon, 23 Nov 2020 21:53:41 +0100
> > > Subject: [PATCH RESEND] misc/vmw_vmci: bail out earlier on too big
> > > queue allocation
> > >
> > > For the allocation of a queue pair in qp_host_alloc_queue() an
> > > arbitrary value can be passed for produce_size, which can lead to
> > > miscalculation of memory we'd like to allocate in one take. A
> > > warning is triggered at __alloc_pages_nodemask() in
> > > mm/page_alloc.c:4737 which aborts the false allocation, but it results in a
> VMware machine freezing for an infinite time.
> > >
> > > Signed-off-by: Norbert Slusarek <nslusarek@xxxxxxx>
> >
> > Thank you for sending a patch with such a detailed analysis and even
> > including a test case!
>
> Yeah agree, very good details!

Yes, thanks a lot for pointing this out.

> >
> > > diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c
> > > b/drivers/misc/vmw_vmci/vmci_queue_pair.c
> > > index c49065887e8f..997ff32475b2 100644
> > > --- a/drivers/misc/vmw_vmci/vmci_queue_pair.c
> > > +++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c
> > > @@ -526,6 +526,7 @@ static struct vmci_queue
> *qp_host_alloc_queue(u64 size)
> > > struct vmci_queue *queue;
> > > size_t queue_page_size;
> > > u64 num_pages;
> > > + unsigned int order;
> > > const size_t queue_size = sizeof(*queue) +
> > > sizeof(*(queue->kernel_if));
> > >
> > > if (size > SIZE_MAX - PAGE_SIZE) @@ -537,6 +538,10 @@ static
> > > struct vmci_queue *qp_host_alloc_queue(u64 size)
> > >
> > > queue_page_size = num_pages *
> > > sizeof(*queue->kernel_if->u.h.page);
> > >
> > > + order = get_order(queue_size + queue_page_size);
> > > + if (order >= MAX_ORDER)
> > > + return NULL;
> > > +
> > > queue = kzalloc(queue_size + queue_page_size, GFP_KERNEL);
> >
> > Calling kzalloc() with that user-provided size may still not be a
> > great idea, and MAX_ORDER is probably more than anyone ever needs
> here
> > (I don't know what the interface is needed for, but usually it is).
> >
> > If there is a reasonable upper bound smaller than MAX_ORDER, I would
> > suggest using that. It might also be helpful to use
> > kvzalloc()/kvfree() in this case, which can return an arbitrary number
> > of pages and suffers less from fragmentation.
>
> I don't know well vmw_vmci, but I took a brief look, and I saw that there is a
> macro (VMCI_MAX_GUEST_QP_MEMORY) used in vmci_qpair_alloc(), I'm
> not sure if we can use the same macro, but maybe something similar.
>
> Honestly I don't know where that limit comes from, maybe it was chosen as
> an arbitrary and reasonable value but I suggest to check if we can reuse the
> same macro here with some adjustments.
> I think in some way that limit is related to the max memory that the host
> should allocate.
>
> @Jorgen any thought?

The VMCI_MAX_GUEST_QP_MEMORY limit is the limit for the total amount of guest memory that can be used for VMCI queue pairs with the current revision of the VMCI virtual device, so it is the upper limit on the size of a single queue pair as well. In this function, the size parameter indicates the desired queue size of one of the queues in the queue pair, so they should be bounded by this as well. The appropriate check in the beginning of the function would be

if (size > min(VMCI_MAX_GUEST_QP_MEMORY, SIZE_MAX - PAGE_SIZE))
return NULL;

Since VMCI_MAX_GUEST_QP_MEMORY is 128MB, the actual kzalloc argument should be well below the limit imposed by MAX_ORDER, with the modification of the above check.

That this unchecked value can occur here at all is due to another bug, where the queue sizes aren't validated against the VMCI_MAX_GUEST_QP_MEMORY limit on the vmci_qp_broker_alloc callpath. We'll fix that and we can update the above check as well in the same change.

Thanks,
Jørgen