Re: [PATCH v2 06/18] xen/pvcalls: handle commands from the frontend
From: Stefano Stabellini
Date: Fri Jun 02 2017 - 14:21:18 EST
On Fri, 26 May 2017, Boris Ostrovsky wrote:
> On 05/19/2017 07:22 PM, Stefano Stabellini wrote:
> > +
> > static void pvcalls_back_work(struct work_struct *work)
> > {
> > + struct pvcalls_back_priv *priv = container_of(work,
> > + struct pvcalls_back_priv, register_work);
> > + int notify, notify_all = 0, more = 1;
> > + struct xen_pvcalls_request req;
> > + struct xenbus_device *dev = priv->dev;
> > +
> > + atomic_set(&priv->work, 1);
> > +
> > + while (more || !atomic_dec_and_test(&priv->work)) {
> > + while (RING_HAS_UNCONSUMED_REQUESTS(&priv->ring)) {
> > + RING_COPY_REQUEST(&priv->ring,
> > + priv->ring.req_cons++,
> > + &req);
> > +
> > + if (!pvcalls_back_handle_cmd(dev, &req)) {
> > + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(
> > + &priv->ring, notify);
> > + notify_all += notify;
> > + }
> > + }
> > +
> > + if (notify_all)
> > + notify_remote_via_irq(priv->irq);
> > +
> > + RING_FINAL_CHECK_FOR_REQUESTS(&priv->ring, more);
> > + }
> > }
> >
> > static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
> > {
> > + struct xenbus_device *dev = dev_id;
> > + struct pvcalls_back_priv *priv = NULL;
> > +
> > + if (dev == NULL)
> > + return IRQ_HANDLED;
> > +
> > + priv = dev_get_drvdata(&dev->dev);
> > + if (priv == NULL)
> > + return IRQ_HANDLED;
> > +
> > + atomic_inc(&priv->work);
>
> I will paste you response here from v1 --- I thought I understood it and
> now I don't anymore.
>
> >>
> >> Is this really needed? We have a new entry on the ring, so the outer
> loop in
> >> pvcalls_back_work() will pick this up (by setting 'more').
> >
> > This is to avoid race conditions. A notification could be delivered
> > after RING_FINAL_CHECK_FOR_REQUESTS is called, returning more == 0, but
> > before pvcalls_back_work completes. In that case, without priv->work,
> > pvcalls_back_work wouldn't be rescheduled because it is still running
> > and the work would be left undone.
>
>
> How is this different from the case when new work comes after the outer
> loop is done but we still haven't returned from pvcalls_back_work()?
It is the same case. In fact, looking at it more closely, I think that
priv->work in its current form makes it more unlikely to happen, but
doesn't prevent it completely :-(
Given that I have been trying to reproduce the race in many ways but
always failed so far, I think this race is only theoretical. I have
removed the priv->work construct, and added a in-code comment about the
race.
> > + queue_work(priv->wq, &priv->register_work);
> > +
> > return IRQ_HANDLED;
> > }