RE: [PATCH v3 0/6] vfio/hisilicon: add acc live migration driver

From: Tian, Kevin
Date: Fri Oct 15 2021 - 01:54:08 EST


> From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>
> Sent: Thursday, September 30, 2021 2:35 PM
>
> > From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
> > Sent: 30 September 2021 01:42
> >
> > > From: Shameerali Kolothum Thodi
> > > <shameerali.kolothum.thodi@xxxxxxxxxx>
> > >
> > > > From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
> > > > Sent: 29 September 2021 10:06
> > > >
> > > > > From: Shameerali Kolothum Thodi
> > > > > <shameerali.kolothum.thodi@xxxxxxxxxx>
> > > > >
> > > > > Hi Kevin,
> > > > >
> > > > > > From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
> > > > > > Sent: 29 September 2021 04:58
> > > > > >
> > > > > > Hi, Shameer,
> > > > > >
> > > > > > > From: Shameer Kolothum
> <shameerali.kolothum.thodi@xxxxxxxxxx>
> > > > > > > Sent: Wednesday, September 15, 2021 5:51 PM
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Thanks to the introduction of vfio_pci_core subsystem
> framework[0],
> > > > > > > now it is possible to provide vendor specific functionality to
> > > > > > > vfio pci devices. This series attempts to add vfio live migration
> > > > > > > support for HiSilicon ACC VF devices based on the new framework.
> > > > > > >
> > > > > > > HiSilicon ACC VF device MMIO space includes both the functional
> > > > > > > register space and migration control register space. As discussed
> > > > > > > in RFCv1[1], this may create security issues as these regions get
> > > > > > > shared between the Guest driver and the migration driver.
> > > > > > > Based on the feedback, we tried to address those concerns in
> > > > > > > this version.
> > > > > >
> > > > > > This series doesn't mention anything related to dirty page tracking.
> > > > > > Are you rely on Keqian's series for utilizing hardware iommu dirty
> > > > > > bit (e.g. SMMU HTTU)?
> > > > >
> > > > > Yes, this doesn't have dirty page tracking and the plan is to make use
> of
> > > > > Keqian's SMMU HTTU work to improve performance. We have done
> > basic
> > > > > sanity testing with those patches.
> > > > >
> > > >
> > > > Do you plan to support migration w/o HTTU as the fallback option?
> > > > Generally one would expect the basic functionality ready before talking
> > > > about optimization.
> > >
> > > Yes, the plan is to get the basic live migration working and then we can
> > > optimize
> > > it with SMMU HTTU when it is available.
> >
> > The interesting thing is that w/o HTTU vfio will just report every pinned
> > page as dirty, i.e. the entire guest memory is dirty. This completely kills
> > the benefit of precopy phase since Qemu still needs to transfer the entire
> > guest memory in the stop-copy phase. This is not a 'working' model for
> > live migration.
> >
> > So it needs to be clear whether HTTU is really an optimization or
> > a hard functional-requirement for migrating such device. If the latter
> > the migration region info is not a nice-to-have thing.
>
> Yes, agree that we have to transfer the entire Guest memory in this case.
> But don't think that is a killer here as we would still like to have the
> basic live migration enabled on these platforms and can be used
> where the constraints of memory transfer is acceptable.
>
> > btw the fallback option that I raised earlier is more like some software
> > mitigation for collecting dirty pages, e.g. analyzing the ring descriptors
> > to build software-tracked dirty info by mediating the cmd portal
> > (which requires dynamically unmapping cmd portal from the fast-path
> > to enable mediation). We are looking into this option for some platform
> > which lacks of IOMMU dirty bit support.
>
> Interesting. Is there anything available publicly so that we can take a look?
>

Not yet. Once we had an implementation based on an old approach before
vfio-pci-core is ready. Now suppose it needs rework based on the new
framework.

+Shaopeng who owns this work.

Thanks
Kevin