Re: [PATCH] vfio iommu type1: no need to check task->mm if task has been destroyed

From: Alex Williamson
Date: Thu Apr 19 2018 - 15:54:59 EST


On Thu, 19 Apr 2018 10:19:26 -0600
Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:

> [cc +Kirti]
>
> On Wed, 18 Apr 2018 18:55:45 +0800
> Xu Yandong <xuyandong2@xxxxxxxxxx> wrote:
>
> > The task structure in vfio_dma struct used to identify the same
> > task who map it or other task who shares same adress space is
> > allowed to unmap. But if the task who map it has exited, mm of
> > the task has been set to null, we should unmap the vfio dma directly.
> >
> > Signed-off-by: Xu Yandong <xuyandong2@xxxxxxxxxx>
> > ---
> > Hi all,
> > When I unplug a vcpu from a VM lanched with a VFIO hostdev device,
> > I found that the *vfio_dma* mapped by this vcpu task could not be unmaped
> > in the future, so I send this patch to unmap vfio_dma directly if the
> > task who mapped it has exited.
> >
> > Howerver this patch may introduce a new security risk because any task can
> > unmap the *vfio_dma* if the mapper task has exited.
>
> Well that's unexpected, but adding some debugging code I can clearly
> see that the map and unmap ioctls are typically called by the various
> processor threads, which all share the same mm_struct (so accounting is
> correct regardless of which CPU does the unmap). I don't think the fix
> below is correct though, it's not for a security risk, but for
> accounting issue and correctness issues. The pages are mapped and
> accounted against the users locked memory limits, if we simply bail
> out, both the IOMMU mappings and the limit accounting are wrong.
> Perhaps rather than referencing the calling task_struct in the vfio_dma
> on mapping, we should traverse to the highest parent task sharing the
> same mm_struct. Kirti, any thoughts since this code originated for
> mdev support? Thanks,

I think something like below is a better solution. More research
required on group_leader and if it needs to be sanity tested or if
testing mm_struct is redundant, but I think it should resolve the
failing test case, all mappings reference the same task regardless of
which vCPU triggers it. Thanks,

Alex

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5c212bf29640..3a1d3695c3fb 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1093,6 +1093,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
int ret = 0, prot = 0;
uint64_t mask;
struct vfio_dma *dma;
+ struct task_struct *task;

/* Verify that none of our __u64 fields overflow */
if (map->size != size || map->vaddr != vaddr || map->iova != iova)
@@ -1131,8 +1132,12 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
dma->iova = iova;
dma->vaddr = vaddr;
dma->prot = prot;
- get_task_struct(current);
- dma->task = current;
+
+ task = (current->mm == current->group_leader->mm ?
+ current->group_leader : current);
+ get_task_struct(task);
+ dma->task = task;
+
dma->pfn_list = RB_ROOT;

/* Insert zero-sized and grow as we map chunks of it */