Re: Re: [Xen-devel] [PATCH v11 2/6] xenbus/backend: Protect xenbus callback with lock

From: SeongJae Park
Date: Tue Dec 17 2019 - 12:28:10 EST


On Tue, 17 Dec 2019 18:10:19 +0100 "JÃrgen GroÃ" <jgross@xxxxxxxx> wrote:

> On 17.12.19 17:24, SeongJae Park wrote:
> > On Tue, 17 Dec 2019 17:13:42 +0100 "JÃrgen GroÃ" <jgross@xxxxxxxx> wrote:
> >
> >> On 17.12.19 17:07, SeongJae Park wrote:
> >>> From: SeongJae Park <sjpark@xxxxxxxxx>
> >>>
> >>> 'reclaim_memory' callback can race with a driver code as this callback
> >>> will be called from any memory pressure detected context. To deal with
> >>> the case, this commit adds a spinlock in the 'xenbus_device'. Whenever
> >>> 'reclaim_memory' callback is called, the lock of the device which passed
> >>> to the callback as its argument is locked. Thus, drivers registering
> >>> their 'reclaim_memory' callback should protect the data that might race
> >>> with the callback with the lock by themselves.
> >>>
> >>> Signed-off-by: SeongJae Park <sjpark@xxxxxxxxx>
> >>> ---
> >>> drivers/xen/xenbus/xenbus_probe.c | 1 +
> >>> drivers/xen/xenbus/xenbus_probe_backend.c | 10 ++++++++--
> >>> include/xen/xenbus.h | 2 ++
> >>> 3 files changed, 11 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
> >>> index 5b471889d723..b86393f172e6 100644
> >>> --- a/drivers/xen/xenbus/xenbus_probe.c
> >>> +++ b/drivers/xen/xenbus/xenbus_probe.c
> >>> @@ -472,6 +472,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
> >>> goto fail;
> >>>
> >>> dev_set_name(&xendev->dev, "%s", devname);
> >>> + spin_lock_init(&xendev->reclaim_lock);
> >>>
> >>> /* Register with generic device framework. */
> >>> err = device_register(&xendev->dev);
> >>> diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c
> >>> index 7e78ebef7c54..516aa64b9967 100644
> >>> --- a/drivers/xen/xenbus/xenbus_probe_backend.c
> >>> +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
> >>> @@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct notifier_block *notifier,
> >>> static int backend_reclaim_memory(struct device *dev, void *data)
> >>> {
> >>> const struct xenbus_driver *drv;
> >>> + struct xenbus_device *xdev;
> >>> + unsigned long flags;
> >>>
> >>> if (!dev->driver)
> >>> return 0;
> >>> drv = to_xenbus_driver(dev->driver);
> >>> - if (drv && drv->reclaim_memory)
> >>> - drv->reclaim_memory(to_xenbus_device(dev));
> >>> + if (drv && drv->reclaim_memory) {
> >>> + xdev = to_xenbus_device(dev);
> >>> + spin_trylock_irqsave(&xdev->reclaim_lock, flags);
> >>
> >> You need spin_lock_irqsave() here. Or maybe spin_lock() would be fine,
> >> too? I can't see a reason why you'd want to disable irqs here.
> >
> > I needed to diable irq here as this is called from the memory shrinker context.
>
> Okay.
>
> >
> > Also, used 'trylock' because the 'probe()' and 'remove()' code of the driver
> > might include memory allocation. And the xen-blkback actually does. If the
> > allocation shows a memory pressure during the allocation, it will trigger this
> > shrinker callback again and then deadlock.
>
> In that case you need to either return when you didn't get the lock or

Yes, it should. Cannot believe how I posted this code. Seems I made some
terrible mistake while formatting patches. Anyway, will return if fail to
acquire the lock, in the next version.


Thanks,
SeongJae Park

>
> - when obtaining the lock during probe() and remove() set a variable
> containing the current cpu number
> - and reset that to e.g NR_CPUS before releasing the lock again
> - in the shrinker callback do trylock, and if you didn't get the lock
> test whether the cpu-variable above is set to your current cpu and
> continue only if yes; if not, redo the the trylock
>
>
> Juergen