RE: [virtio-dev] RE: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
From: Parav Pandit
Date: Wed Sep 20 2023 - 00:10:57 EST
> From: Chen, Jiqian <Jiqian.Chen@xxxxxxx>
> Sent: Wednesday, September 20, 2023 9:28 AM
> >> For above purpose, we need a mechanism that allows guests and QEMU to
> >> negotiate their reset behavior. So this patch add a new parameter
> >> named
> > Freeze != reset. :)
> > Please fix it to say freeze or suspend.
> But in my virtio-gpu scene, I want to prevent Qemu destroying resources when
> Guest do resuming(pci_pm_resume-> virtio_pci_restore->
> virtio_device_restore-> virtio_reset_device-> vp_modern_set_status->Qemu
> virtio_pci_reset->virtio_gpu_gl_reset-> virtio_gpu_reset). And I add check in
> virtio_gpu_gl_reset and virtio_gpu_reset, if freeze_mode was set to FREEZE_S3
> during Guest suspending, Qemu will not destroy resources. So the reason why I
> add this mechanism is to affect the reset behavior. And I think this also can help
> other virtio devices to affect their behavior, like the issue of virtio-video which
> Mikhail Golubev-Ciuchea encountered.
>
The point is when driver tells to freeze, it is freeze command and not reset.
So resume() should not invoke device_reset() when FREEZE+RESUME supported.
> >
> >> freeze_mode to struct virtio_pci_common_cfg. And when guest suspends,
> >> it can write freeze_mode to be FREEZE_S3, and then virtio devices can
> >> change their reset behavior on Qemu side according to freeze_mode.
> >> What's more,
> > Not reset, but suspend behavior.
> The same reason as above.
>
Reset should not be done by the guest driver when the device supports unfreeze.
> >
> >> freeze_mode can be used for all virtio devices to affect the behavior
> >> of Qemu, not just virtio gpu device.
> >>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@xxxxxxx>
> >> ---
> >> transport-pci.tex | 7 +++++++
> >> 1 file changed, 7 insertions(+)
> >>
> >> diff --git a/transport-pci.tex b/transport-pci.tex index
> >> a5c6719..2543536 100644
> >> --- a/transport-pci.tex
> >> +++ b/transport-pci.tex
> >> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure
> >> layout}\label{sec:Virtio Transport
> >> le64 queue_desc; /* read-write */
> >> le64 queue_driver; /* read-write */
> >> le64 queue_device; /* read-write */
> >> + le16 freeze_mode; /* read-write */
> >> le16 queue_notif_config_data; /* read-only for driver */
> >> le16 queue_reset; /* read-write */
> >>
> > The new field cannot be in the middle of the structure.
> > Otherwise, the location of the queue_notif_config_data depends on
> completely unrelated feature bit, breaking the backward compatibility.
> > So please move it at the end.
> I have confused about this. I found in latest kernel code(master branch):
> struct virtio_pci_common_cfg {
> /* About the whole device. */
> __le32 device_feature_select; /* read-write */
> __le32 device_feature; /* read-only */
> __le32 guest_feature_select; /* read-write */
> __le32 guest_feature; /* read-write */
> __le16 msix_config; /* read-write */
> __le16 num_queues; /* read-only */
> __u8 device_status; /* read-write */
> __u8 config_generation; /* read-only */
>
> /* About a specific virtqueue. */
> __le16 queue_select; /* read-write */
> __le16 queue_size; /* read-write, power of 2. */
> __le16 queue_msix_vector; /* read-write */
> __le16 queue_enable; /* read-write */
> __le16 queue_notify_off; /* read-only */
> __le32 queue_desc_lo; /* read-write */
> __le32 queue_desc_hi; /* read-write */
> __le32 queue_avail_lo; /* read-write */
> __le32 queue_avail_hi; /* read-write */
> __le32 queue_used_lo; /* read-write */
> __le32 queue_used_hi; /* read-write */
>
> __le16 freeze_mode; /* read-write */
> };
> There is no queue_notif_config_data or queue_reset, and freeze_mode I added
> is at the end. Why is it different from virtio-spec?
>
Because notify data may not be used by Linux driver so it may be shorter.
I didn’t dig code yet.
> >
> >> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure
> >> layout}\label{sec:Virtio Transport \item[\field{queue_device}]
> >> The driver writes the physical address of Device Area here.
> >> See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
> >>
> >> +\item[\field{freeze_mode}]
> >> + The driver writes this to set the freeze mode of virtio pci.
> >> + VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
> >> + VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and
> >> +virtio-
> > For above names, please define the actual values in the spec.
> Ok, I will add them.
>
> >
> >> pci enters S3 suspension;
> >> + Other values are reserved for future use, like S4, etc.
> >> +
> > It cannot be just one way communication from driver to device as freezing the
> device of few hundred MB to GB of gpu memory or other device memory can
> take several msec.
> > Hence driver must poll to get the acknowledgement from the device that
> freeze functionality is completed.
> I think the freeze functionality itself has not many problems. My patches just
> want to tell Qemu that the reset request is from the process of guest resuming
> not other scene, and write a status into freeze_mode, then we can change the
> reset behavior during guest resuming.
>
Either guest should do freeze or reset, not both.
With that each functionality has clear semantics of what it exactly does.
Freeze to not change reset behavior.
I am not saying freeze functionality has problem. Freeze functionality is request, response mechanism.
Driver requests its, device takes it sweet time more than 50nsec, to freeze large device. Responds back in a 1msec or some finite time that freeze done.
And driver can progress to freeze the VM.
Same on unfreeze, it can bring back large amount of memory from some slow media.
So unfreeze to be request->response as well.
> >
> >
> > You need to describe what exactly should happen in the device when its
> freeze.
> > Please refer to my series where infrastructure is added for device migration
> where the FREEZE mode behavior is defined.
> > It is similar to what you define, but its management plane operation controlled
> outside of the guest VM.
> > But it is good direction in terms of what to define in spec language.
> > https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@n
> > vidia.com/T/#u
> Thank you very much for your suggestion. I will refer to your link and then
> modify my description.
>
> >
> > you are missing the feature bit to indicate to the driver that device supports
> this functionality.
> > Please add one.
> Do I need to add feature bit to DEFINE_VIRTIO_COMMON_FEATURES?
Explore VIRTIO_F_RING_RESET touch points.
You can also explore new patch [1] which adds generic feature bit to understand the spec touch points where to add etc.
[1] https://lore.kernel.org/virtio-comment/20230918173518.15900-1-parav@xxxxxxxxxx/T/#m9dd18d352e3ac38e0e7c82ad9a634db43dfc8b3b
> And
> each time when I write freeze_mode filed on kernel driver side, I need to check
> this bit?
>
Yes.