Re: [PATCH v3 2/2] vhost-vdpa: fix page pinning leakage in error path

From: Jason Wang
Date: Wed Oct 14 2020 - 07:41:34 EST



On 2020/10/14 下午2:52, Michael S. Tsirkin wrote:
On Tue, Oct 13, 2020 at 04:42:59PM -0700, si-wei liu wrote:
On 10/9/2020 7:27 PM, Jason Wang wrote:
On 2020/10/3 下午1:02, Si-Wei Liu wrote:
Pinned pages are not properly accounted particularly when
mapping error occurs on IOTLB update. Clean up dangling
pinned pages for the error path. As the inflight pinned
pages, specifically for memory region that strides across
multiple chunks, would need more than one free page for
book keeping and accounting. For simplicity, pin pages
for all memory in the IOVA range in one go rather than
have multiple pin_user_pages calls to make up the entire
region. This way it's easier to track and account the
pages already mapped, particularly for clean-up in the
error path.

Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
Signed-off-by: Si-Wei Liu<si-wei.liu@xxxxxxxxxx>
---
Changes in v3:
- Factor out vhost_vdpa_map() change to a separate patch

Changes in v2:
- Fix incorrect target SHA1 referenced

drivers/vhost/vdpa.c | 119
++++++++++++++++++++++++++++++---------------------
1 file changed, 71 insertions(+), 48 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 0f27919..dad41dae 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -595,21 +595,19 @@ static int
vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
struct vhost_dev *dev = &v->vdev;
struct vhost_iotlb *iotlb = dev->iotlb;
struct page **page_list;
- unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
+ struct vm_area_struct **vmas;
unsigned int gup_flags = FOLL_LONGTERM;
- unsigned long npages, cur_base, map_pfn, last_pfn = 0;
- unsigned long locked, lock_limit, pinned, i;
+ unsigned long map_pfn, last_pfn = 0;
+ unsigned long npages, lock_limit;
+ unsigned long i, nmap = 0;
u64 iova = msg->iova;
+ long pinned;
int ret = 0;
if (vhost_iotlb_itree_first(iotlb, msg->iova,
msg->iova + msg->size - 1))
return -EEXIST;
- page_list = (struct page **) __get_free_page(GFP_KERNEL);
- if (!page_list)
- return -ENOMEM;
-
if (msg->perm & VHOST_ACCESS_WO)
gup_flags |= FOLL_WRITE;
@@ -617,61 +615,86 @@ static int
vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
if (!npages)
return -EINVAL;
+ page_list = kvmalloc_array(npages, sizeof(struct page *),
GFP_KERNEL);
+ vmas = kvmalloc_array(npages, sizeof(struct vm_area_struct *),
+ GFP_KERNEL);
This will result high order memory allocation which was what the code
tried to avoid originally.

Using an unlimited size will cause a lot of side effects consider VM or
userspace may try to pin several TB of memory.
Hmmm, that's a good point. Indeed, if the guest memory demand is huge or the
host system is running short of free pages, kvmalloc will be problematic and
less efficient than the __get_free_page implementation.
OK so ... Jason, what's the plan?

How about you send a patchset with
1. revert this change
2. fix error handling leak


Work for me, but it looks like siwei want to do this.

So it's better for to send the patchset.

Thanks