Re: [PATCH v2] virtio: only reset device and restore status if needed in device resume

From: Jason Wang
Date: Mon Nov 04 2024 - 22:10:27 EST


On Fri, Nov 1, 2024 at 1:23 PM Qiang Zhang <qiang4.zhang@xxxxxxxxxxxxxxx> wrote:
>
> On Fri, Nov 01, 2024 at 10:11:11AM +0800, Jason Wang wrote:
> > On Fri, Nov 1, 2024 at 9:54 AM <qiang4.zhang@xxxxxxxxxxxxxxx> wrote:
> > >
> > > From: Qiang Zhang <qiang4.zhang@xxxxxxxxx>
> > >
> > > Virtio core unconditionally reset and restore status for all virtio
> > > devices before calling restore method. This breaks some virtio drivers
> > > which don't need to do anything in suspend and resume because they
> > > just want to keep device state retained.
> >
> > The challenge is how can driver know device doesn't need rest.
>
> Hi,
>
> Per my understanding to PM, in the suspend flow, device drivers need to
> 1. First manage/stop accesses from upper level software and
> 2. Store the volatile context into in-memory data structures.
> 3. Put devices into some low power (suspended) state.
> The resume process does the reverse.
> If a device context won't loose after entering some low power state
> (optional), it's OK to skip step 2.
>
> For virtio devices, spec doesn't define whether their states will lost
> after platform entering suspended state.

This is exactly what suspend patch tries to define.

> So to work with different
> hypervisors, virtio drivers typically trigger a reset in suspend/resume
> flow. This works fine for virtio devices if following conditions are met:
> - Device state can be totally recoverable.
> - There isn't any working behaviour expected in suspended state, i.e. the
> suspended state should be sub-state of reset.
> However, the first point may be hard to implement from driver side for some
> devices. The second point may be unacceptable for some kind of devices.
>
> For your question, for devices whose suspended state is alike reset state,
> the hypervisor have the flexibility to retain its state or not, kernel
> driver can unconditionally reset it with proper re-initialization to
> accomplish better compatibility. For others, hypervisor *must* retain
> device state and driver just keeps using it.

Right, so my question is how did the driver know the behaviour of a
device? We usually do that via a feature bit.

Note that the thing that matters here is the migration compatibility.

>
> >
> > For example, PCI has no_soft_reset which has been done in the commit
> > "virtio: Add support for no-reset virtio PCI PM".
> >
> > And there's a ongoing long discussion of adding suspend support in the
> > virtio spec, then driver know it's safe to suspend/resume without
> > reset.
>
> That's great! Hopefully it can fill the gap.
> Currently, I think we can safely move the reset to drivers' freeze methods,
> virtio core has no reason to take it as a common action required by all
> devices. And the reset operation can be optional skipped if driver have
> hints from device that it can retain state.

The problem here is whether the device can be resumed without "soft
reset" seems a general feature which could be either the knowledge of

1) virtio core (a feature bit or not)

or

2) transport layer (like PCI)

>
> >
> > >
> > > Virtio GPIO is a typical example. GPIO states should be kept unchanged
> > > after suspend and resume (e.g. output pins keep driving the output) and
> > > Virtio GPIO driver does nothing in freeze and restore methods. But the
> > > reset operation in virtio_device_restore breaks this.
> >
> > Is this mandated by GPIO or virtio spec? If yes, let's quote the revelant part.
>
> No. But in actual hardware design (e.g. Intel PCH GPIO), or from the
> requirement perspective, GPIO pin state can be (should support) retained
> in suspended state.
> If Virtio GPIO is used to let VM operate such physical GPIO chip indirectly,
> it can't be reset in suspend and resume. Meanwhile the hypervisor will
> retain pin states after suspension.
>
> >
> > >
> > > Since some devices need reset in suspend and resume while some needn't,
> > > create a new helper function for the original reset and status restore
> > > logic so that virtio drivers can invoke it in their restore method
> > > if necessary.
> >
> > How are those drivers classified?
>
> I think this depends whether hypervisor will keep devices state in platform
> suspend process.

So the problem is that the actual implementation (hypervisor, physical
device or mediation) is transparent to the driver. Driver needs a
general way to know whether it's safe (or not) to reset during the
suspend/resume.

> I think hypervisor should because suspend and reset are
> conceptually two different things.

Probably, but rest is and doing software state load/save is common
practice for devices that will lose their state during PM.

Thanks

>
>
> Thanks
> Qiang
>