Re: [PATCH v3] virtio_balloon: Convert "vballoon" kthread into a workqueue

From: Petr Mladek
Date: Thu Nov 20 2014 - 12:17:32 EST


On Thu 2014-11-20 19:00:16, Michael S. Tsirkin wrote:
> On Thu, Nov 20, 2014 at 05:55:58PM +0100, Petr Mladek wrote:
> > On Thu 2014-11-20 11:29:35, Tejun Heo wrote:
> > > On Thu, Nov 20, 2014 at 06:26:24PM +0200, Michael S. Tsirkin wrote:
> > > > On Thu, Nov 20, 2014 at 06:25:43PM +0200, Michael S. Tsirkin wrote:
> > > > > On Thu, Nov 20, 2014 at 11:07:46AM -0500, Tejun Heo wrote:
> > > > > > On Thu, Nov 20, 2014 at 05:03:17PM +0100, Petr Mladek wrote:
> > > > > > ...
> > > > > > > @@ -476,7 +460,6 @@ static void virtballoon_remove(struct virtio_device *vdev)
> > > > > > > {
> > > > > > > struct virtio_balloon *vb = vdev->priv;
> > > > > > >
> > > > > > > - kthread_stop(vb->thread);
> > > > > > > remove_common(vb);
> > > > > > > kfree(vb);
> > > > > > > }
> > > > > >
> > > > > > Shouldn't the work item be flushed before removal is complete?
> >
> > Great catch!
> >
> > > > > In fact, flushing it won't help because it can requeue itself, right?
> > >
> > > There's cancel_work_sync() to stop the self-requeueing ones.
> >
> > Ah, one more problem is that remove_common(vb) calls leak_balloon()
> > that queues the work if not finished. We would need to add some flag
> > or variant that would disable the queuing when called here.
> >
>
> That's why Tejun suggested cancel_work_sync, IIUC it stops
> the requeuing without need for extra flags.

But he also wrote that it handles only self-queuing. The queuing from
external locations need to be prevented other ways.

> > > > From that POV a dedicated WQ kept it simple.
> > >
> > > A dedicated wq doesn't do anything for that. You can't shut down a
> > > workqueue with a pending work item on it. destroy_workqueue() will
> > > try to drain the target wq, warn if it doesn't finish in certain
> > > number of iterations and just keep trying indefinitely.
> >
> > I wonder if it is guaranteed that none would trigger
> > stats_request() or virtballoon_changed() when virtballoon_remove() is
> > being called. I guess so because the original code would fail
> > otherwise. The two functions access "vb->config_change"
> > and the structure is freed in virtballoon_remove() without
> > any protection.
> >
> > I am trying to confirm this by reading the code but it is not that
> > easy.
> >
> > Best Regards,
> > Petr
>
> It's synchronized through hardware. remove_common calls reset and
> del_vqs which will prevent new interrupts.

I see, it means that stats_request() or virtballoon_changed() can be
called until vb->vdev->config->reset(vb->vdev); is called in
remove_common().

It means that fill_balloon() can be queued and proceed after we leak
all pages and before we reset the devices in remove_common(). I have
to think about a way how to avoid this. Maybe add some flag into
struct virtio_balloon that would signalize that the balloon is being
removed and new operations should not longer be queued. But there
might be a more elegant solution.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/