Re: [PATCH] vfio: remove useless judgement
From: Steven Sistare
Date: Tue Jun 28 2022 - 08:48:25 EST
For cpr, old qemu directly exec's new qemu, so task does not change.
To support fork+exec, the ownership test needs to be deleted or modified.
Pinned page accounting is another issue, as the parent counts pins in its
mm->locked_vm. If the child unmaps, it cannot simply decrement its own
mm->locked_vm counter. As you and I have discussed, the count is also
wrong in the direct exec model, because exec clears mm->locked_vm. I am
thinking vfio could count pins in struct user locked_vm to handle both
models. The user struct and its count would persist across direct exec,
and be shared by parent and child for fork+exec. However, that does change
the RLIMIT_MEMLOCK value that applications must set, because the limit must
accommodate vfio plus other sub-systems that count in user->locked_vm, which
includes io_uring, skbuff, xdp, and perf. Plus, the limit must accommodate all
processes of that user, not just a single process.
Folks like fork+exec because it allows recovery if the new qemu process fails to
initialize. One can fall back to the original process, if the above issues are fixed.
- Steve
On 6/27/2022 6:06 PM, Alex Williamson wrote:
>
> Hey Steve, how did you get around this for cpr or is this a gap?
> Thanks,
>
> Alex
>
> On Mon, 27 Jun 2022 11:51:09 +0800
> lizhe.67@xxxxxxxxxxxxx wrote:
>
>> From: Li Zhe <lizhe.67@xxxxxxxxxxxxx>
>>
>> In function vfio_dma_do_unmap(), we currently prevent process to unmap
>> vfio dma region whose mm_struct is different from the vfio_dma->task.
>> In our virtual machine scenario which is using kvm and qemu, this
>> judgement stops us from liveupgrading our qemu, which uses fork() &&
>> exec() to load the new binary but the new process cannot do the
>> VFIO_IOMMU_UNMAP_DMA action during vm exit because of this judgement.
>>
>> This judgement is added in commit 8f0d5bb95f76 ("vfio iommu type1: Add
>> task structure to vfio_dma") for the security reason. But it seems that
>> no other task who has no family relationship with old and new process
>> can get the same vfio_dma struct here for the reason of resource
>> isolation. So this patch delete it.
>>
>> Signed-off-by: Li Zhe <lizhe.67@xxxxxxxxxxxxx>
>> Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxx>
>> ---
>> drivers/vfio/vfio_iommu_type1.c | 6 ------
>> 1 file changed, 6 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index c13b9290e357..a8ff00dad834 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -1377,12 +1377,6 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>
>> if (!iommu->v2 && iova > dma->iova)
>> break;
>> - /*
>> - * Task with same address space who mapped this iova range is
>> - * allowed to unmap the iova range.
>> - */
>> - if (dma->task->mm != current->mm)
>> - break;
>>
>> if (invalidate_vaddr) {
>> if (dma->vaddr_invalid) {
>