Re: [PATCH v3 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
From: Wanpeng Li
Date: Thu Sep 01 2016 - 02:02:06 EST
2016-09-01 13:46 GMT+08:00 Li, Liang Z <liang.z.li@xxxxxxxxx>:
>> Subject: Re: [PATCH v3 kernel 0/7] Extend virtio-balloon for fast (de)inflating
>> & fast live migration
>>
>> 2016-08-08 14:35 GMT+08:00 Liang Li <liang.z.li@xxxxxxxxx>:
>> > This patch set contains two parts of changes to the virtio-balloon.
>> >
>> > One is the change for speeding up the inflating & deflating process,
>> > the main idea of this optimization is to use bitmap to send the page
>> > information to host instead of the PFNs, to reduce the overhead of
>> > virtio data transmission, address translation and madvise(). This can
>> > help to improve the performance by about 85%.
>> >
>> > Another change is for speeding up live migration. By skipping process
>> > guest's free pages in the first round of data copy, to reduce needless
>> > data processing, this can help to save quite a lot of CPU cycles and
>> > network bandwidth. We put guest's free page information in bitmap and
>> > send it to host with the virt queue of virtio-balloon. For an idle 8GB
>> > guest, this can help to shorten the total live migration time from
>> > 2Sec to about 500ms in the 10Gbps network environment.
>>
>> I just read the slides of this feature for recent kvm forum, the cloud
>> providers more care about live migration downtime to avoid customers'
>> perception than total time, however, this feature will increase downtime
>> when acquire the benefit of reducing total time, maybe it will be more
>> acceptable if there is no downside for downtime.
>>
>> Regards,
>> Wanpeng Li
>
> In theory, there is no factor that will increase the downtime. There is no additional operation
> and no more data copy during the stop and copy stage. But in the test, the downtime increases
> and this can be reproduced. I think the busy network line maybe the reason for this. With this
> optimization, a huge amount of data is written to the socket in a shorter time, so some of the write
> operation may need to wait. Without this optimization, zero page checking takes more time,
> the network is not so busy.
>
> If the guest is not an idle one, I think the gap of the downtime will not so obvious. Anyway, the
http://www.linux-kvm.org/images/c/c3/03x06B-Liang_Li-Real_Time_and_Fast_Live_Migration_Update_for_NFV.pdf
The slides show almost the similar percentage for the idle and the
non-idle guests, they both increase ~50% downtime.
Regards,
Wanpeng Li