Re: [RFC PATCH 0/3] x86: Add support for guest DMA dirty page tracking
From: Michael S. Tsirkin
Date: Tue Jan 05 2016 - 08:16:57 EST
On Tue, Jan 05, 2016 at 12:43:03PM +0000, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@xxxxxxxxxx) wrote:
> > On Tue, Jan 05, 2016 at 10:45:25AM +0000, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (mst@xxxxxxxxxx) wrote:
> > > > On Tue, Jan 05, 2016 at 10:01:04AM +0000, Dr. David Alan Gilbert wrote:
> > > > > * Michael S. Tsirkin (mst@xxxxxxxxxx) wrote:
> > > > > > On Mon, Jan 04, 2016 at 07:11:25PM -0800, Alexander Duyck wrote:
> > > > > > > >> The two mechanisms referenced above would likely require coordination with
> > > > > > > >> QEMU and as such are open to discussion. I haven't attempted to address
> > > > > > > >> them as I am not sure there is a consensus as of yet. My personal
> > > > > > > >> preference would be to add a vendor-specific configuration block to the
> > > > > > > >> emulated pci-bridge interfaces created by QEMU that would allow us to
> > > > > > > >> essentially extend shpc to support guest live migration with pass-through
> > > > > > > >> devices.
> > > > > > > >
> > > > > > > > shpc?
> > > > > > >
> > > > > > > That is kind of what I was thinking. We basically need some mechanism
> > > > > > > to allow for the host to ask the device to quiesce. It has been
> > > > > > > proposed to possibly even look at something like an ACPI interface
> > > > > > > since I know ACPI is used by QEMU to manage hot-plug in the standard
> > > > > > > case.
> > > > > > >
> > > > > > > - Alex
> > > > > >
> > > > > >
> > > > > > Start by using hot-unplug for this!
> > > > > >
> > > > > > Really use your patch guest side, and write host side
> > > > > > to allow starting migration with the device, but
> > > > > > defer completing it.
> > > > > >
> > > > > > So
> > > > > >
> > > > > > 1.- host tells guest to start tracking memory writes
> > > > > > 2.- guest acks
> > > > > > 3.- migration starts
> > > > > > 4.- most memory is migrated
> > > > > > 5.- host tells guest to eject device
> > > > > > 6.- guest acks
> > > > > > 7.- stop vm and migrate rest of state
> > > > > >
> > > > > >
> > > > > > It will already be a win since hot unplug after migration starts and
> > > > > > most memory has been migrated is better than hot unplug before migration
> > > > > > starts.
> > > > > >
> > > > > > Then measure downtime and profile. Then we can look at ways
> > > > > > to quiesce device faster which really means step 5 is replaced
> > > > > > with "host tells guest to quiesce device and dirty (or just unmap!)
> > > > > > all memory mapped for write by device".
> > > > >
> > > > >
> > > > > Doing a hot-unplug is going to upset the guests network stacks view
> > > > > of the world; that's something we don't want to change.
> > > > >
> > > > > Dave
> > > >
> > > > It might but if you store the IP and restore it quickly
> > > > after migration e.g. using guest agent, as opposed to DHCP,
> > > > then it won't.
> > >
> > > I thought if you hot-unplug then it will lose any outstanding connections
> > > on that device.
> >
> > Which connections and which device? TCP connections and an ethernet
> > device? These are on different layers so of course you don't lose them.
> > Just do not change the IP address.
> >
> > Some guests send a signal to applications to close connections
> > when all links go down. One can work around this
> > in a variety of ways.
>
> So, OK, I was surprised that a simple connection didn't go down when
> I tested and just removed the network card; I'd thought stuff was more
> aggressive when there was no route.
> But as you say, some stuff does close connections when the links go down/away
> so we do need to work around that; and any new outgoing connections get
> a 'no route to host'.
You can create a dummy device in guest for the duration of migration.
Use guest agent to move IP address there and that should be enough to trick most guests.
> So I'm still nervous what will break.
>
> Dave
I'm not saying nothing breaks. Far being from it. For example, some NAT
or firewall implementations keep state per interface and these might
lose state (if using NAT/stateful firewall within guest).
So yes it *would* be useful to teach guests, for example, that a device
is "not dead, just resting" and that another device will shortly come
and take its place.
But the simple setup is already useful and worth supporting, and merging
things gradually will help this project finally get off the ground.
> >
> > > > It allows calming the device down in a generic way,
> > > > specific drivers can then implement the fast quiesce.
> > >
> > > Except that if it breaks the guest networking it's useless.
> > >
> > > Dave
> > >
> > > >
> > > > > >
> > > > > > --
> > > > > > MST
> > > > > --
> > > > > Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK
> > > --
> > > Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/