Re: [PATCH 06/21] vfio/pci: Retrieve preserved device files after Live Update

From: Alex Mastro

Date: Thu Dec 04 2025 - 05:30:35 EST

On Wed, Dec 03, 2025 at 09:29:27AM -0800, David Matlack wrote:
> On Wed, Dec 3, 2025 at 7:46 AM Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> wrote:
> >
> > On Wed, Dec 3, 2025 at 7:55 AM Alex Mastro <amastro@xxxxxx> wrote:
> > >
> > > On Wed, Nov 26, 2025 at 07:35:53PM +0000, David Matlack wrote:
> > > > From: Vipin Sharma <vipinsh@xxxxxxxxxx>
> > > > static int vfio_pci_liveupdate_retrieve(struct liveupdate_file_op_args *args)
> > > > {
> > > > - return -EOPNOTSUPP;
> > > > + struct vfio_pci_core_device_ser *ser;
> > > > + struct vfio_device *device;
> > > > + struct folio *folio;
> > > > + struct file *file;
> > > > + int ret;
> > > > +
> > > > + folio = kho_restore_folio(args->serialized_data);
> > > > + if (!folio)
> > > > + return -ENOENT;
> > >
> > > Should this be consistent with the behavior of pci_flb_retrieve() which panics
> > > on failure? The short circuit failure paths which follow leak the folio,
>
> Thanks for catching the leaked folio. I'll fix that in the next version.
>
> > > which seems like a hygiene issue, but the practical significance is moot if
> > > vfio_pci_liveupdate_retrieve() failure is catastrophic anyways?
> >
> > pci_flb_retrieve() is used during boot. If it fails, we risk DMA
> > corrupting any memory region, so a panic makes sense. In contrast,
> > this retrieval happens once we are already in userspace, allowing the
> > user to decide how to handle the failure to recover the preserved
> > cdev.
>
> This is what I was thinking as well. vfio_pci_liveupdate_retrieve()
> runs in the context of the ioctl LIVEUPDATE_SESSION_RETRIEVE_FD, so we
> can just return an error up to userspace if anything goes wrong and
> let userspace initiate the reboot to recover the device if/when it's
> ready.
>
> OTOH, pci_flb_retrieve() gets called by the kernel during early boot
> to determine what devices the previous kernel preserved. If the kernel
> can't determine which devices were preserved by the previous kernel
> and once the kernel starts preserving I/O page tables, that could lead
> to corruption, so panicking is warranted.

Make sense, thanks for elaborating David and Pasha.