Re: [PATCH RFC] vfio error recovery: kernel support
From: Alex Williamson
Date: Fri Jan 20 2017 - 08:54:21 EST
On Fri, 20 Jan 2017 18:13:22 +0800
Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote:
> On 01/20/2017 04:16 AM, Michael S. Tsirkin wrote:
> > This is a design and an initial patch for kernel side for AER
> > support in VFIO.
> >
> > 0. What happens now (PCIE AER only)
> > Fatal errors cause a link reset.
> > Non fatal errors don't.
> > All errors stop the VM eventually, but not immediately
> > because it's detected and reported asynchronously.
> > Interrupts are forwarded as usual.
> > Correctable errors are not reported to guest at all.
> > Note: PPC EEH is different. This focuses on AER.
> >
> > 1. Correctable errors
> > I don't see a need to report these to guest. So let's not.
> >
> > 2. Fatal errors
> > It's not easy to handle them gracefully since link reset
> > is needed. As a first step, let's use the existing mechanism
> > in that case.
> >
> > 2. Non-fatal errors
> > Here we could make progress by reporting them to guest
> > and have guest handle them.
> > Issues:
> > a. this behaviour should only be enabled with new userspace
> > old userspace should work without changes
> > Suggestion: One way to address this would be to add a new eventfd
> > non_fatal_err_trigger. If not set, invoke err_trigger.
> >
> > b. drivers are supposed to stop MMIO when error is reported
> > if vm keeps going, we will keep doing MMIO/config
> > Suggestion 1: ignore this. vm stop happens much later when userspace runs anyway,
> > so we are not making things much worse
> > Suggestion 2: try to stop MMIO/config, resume on resume call
> >
> > Patch below implements Suggestion 1.
> >
> > c. PF driver might detect that function is completely broken,
> > if vm keeps going, we will keep doing MMIO/config
> > Suggestion 1: ignore this. vm stop happens much later when userspace runs anyway,
> > so we are not making things much worse
> > Suggestion 2: detect this and invoke err_trigger to stop VM
> >
> > Patch below implements Suggestion 2.
> >
> > Aside: we currently return PCI_ERS_RESULT_DISCONNECT when device
> > is not attached. This seems bogus, likely based on the confusing name.
> > We probably should return PCI_ERS_RESULT_CAN_RECOVER.
> >
> > The following patch does not change that.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
> >
> > ---
> >
> > The patch is completely untested. Let's discuss the design first.
> > Cao jin, if this is deemed acceptable please take it from here.
> >
>
> Ok, thanks very much.
>
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index dce511f..fdca683 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -1292,7 +1292,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
> >
> > mutex_lock(&vdev->igate);
> >
> > - if (vdev->err_trigger)
> > + if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger)
> > + eventfd_signal(vdev->err_trigger, 1);
> > + else if (vdev->err_trigger)
> > eventfd_signal(vdev->err_trigger, 1);
> >
> > mutex_unlock(&vdev->igate);
> > @@ -1302,8 +1304,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
> > return PCI_ERS_RESULT_CAN_RECOVER;
> > }
> >
> > +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev,
> > + pci_channel_state_t state)
> > +{
> > + struct vfio_pci_device *vdev;
> > + struct vfio_device *device;
> > +
> > + device = vfio_device_get_from_dev(&pdev->dev);
> > + if (!device)
> > + goto err_dev;
> > +
> > + vdev = vfio_device_data(device);
> > + if (!vdev)
> > + goto err_dev;
> > +
> > + mutex_lock(&vdev->igate);
> > +
> > + if (vdev->err_trigger)
> > + eventfd_signal(vdev->err_trigger, 1);
> > +
> > + mutex_unlock(&vdev->igate);
> > +
> > + vfio_device_put(device);
> > +
> > +err_data:
> > + vfio_device_put(device);
> > +err_dev:
> > + return PCI_ERS_RESULT_RECOVERED;
> > +}
> > +
> > static const struct pci_error_handlers vfio_err_handlers = {
> > .error_detected = vfio_pci_aer_err_detected,
> > + .slot_reset = vfio_pci_aer_slot_reset,
> > };
> >
>
> if .slot_reset wants to be called, .error_detected should return
> PCI_ERS_RESULT_NEED_RESET, as pci-error-recovery.txt said, so does code.
>
> Is .slot_reset now just a copy of .error_detected and we are going do
> some tricks here? or else don't get why .slot_reset signal user again.
If error_detected returns NEED_RESET, then slot_reset will always be
called and every error escalated to fatal and we've made no progress.
The point of having slot_reset is to test whether any other driver
escalated to a NEEDS_RESET; we don't want it to be called.
> > static struct pci_driver vfio_pci_driver = {
> > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> > index 1c46045..e883db5 100644
> > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > @@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
> > count, flags, data);
> > }
> >
> > +static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev,
> > + unsigned index, unsigned start,
> > + unsigned count, uint32_t flags, void *data)
> > +{
> > + if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1)
> > + return -EINVAL;
> > +
> > + return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger,
> > + count, flags, data);
> > +}
> > +
> > static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
> > unsigned index, unsigned start,
> > unsigned count, uint32_t flags, void *data)
> > @@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
> > break;
> > }
> > break;
> > + case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX:
> > + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> > + case VFIO_IRQ_SET_ACTION_TRIGGER:
> > + if (pci_is_pcie(vdev->pdev))
> > + func = vfio_pci_set_err_trigger;
>
> s/vfio_pci_set_err_trigger/vfio_pci_set_non_fatal_err_trigger
>
> > + break;
> > + }
> > + break;
> > case VFIO_PCI_REQ_IRQ_INDEX:
> > switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
> > case VFIO_IRQ_SET_ACTION_TRIGGER:
> > diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> > index f37c73b..c27a507 100644
> > --- a/drivers/vfio/pci/vfio_pci_private.h
> > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > @@ -93,6 +93,7 @@ struct vfio_pci_device {
> > struct pci_saved_state *pci_saved_state;
> > int refcnt;
> > struct eventfd_ctx *err_trigger;
> > + struct eventfd_ctx *non_fatal_err_trigger;
> > struct eventfd_ctx *req_trigger;
> > struct list_head dummy_resources_list;
> > };
> >
>