[RFC PATCH 0/3] Fix errors on DT overlay removal with devlinks

From: Michael Auchter
Date: Wed Oct 14 2020 - 15:36:56 EST


After updating to v5.9, I've started seeing errors in the kernel log
when using device tree overlays. Specifically, the problem seems to
happen when removing a device tree overlay that contains two devices
with some dependency between them (e.g., a device that provides a clock
and a device that consumes that clock). Removing such an overlay results
in:

OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy
OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy

followed by hitting some REFCOUNT_WARNs in refcount.c

In the first patch, I've included a unittest that can be used to
reproduce this when built with CONFIG_OF_UNITTEST [1].

I believe the issue is caused by the cleanup performed when releasing
the devlink device that's created to represent the dependency between
devices. The devlink device has references to the consumer and supplier
devices, which it drops in device_link_free; the devlink device's
release callback calls device_link_free via call_srcu.

When the overlay is being removed, all devices are removed, and
eventually the release callback for the devlink device run, and
schedules cleanup using call_srcu. Before device_link_free can and call
put_device on the consumer/supplier, the rest of the overlay removal
process runs, resulting in the error traces above.

Patches 2 and 3 are an attempt at fixing this: call srcu_barrier to wait
for any pending device_link_free's to execute before continuing on with
the removal process.

These patches resolve the issue, but probably not in the best way. In
particular, it seems strange to need to leak details of devlinks into
the device tree overlay code. So, I'd be curious to get some feedback or
hear any other ideas for how to resolve this issue.

Thanks,
Michael

1. Note that this isn't a very good unit test: it will report a "pass"
even if it fails with the aforementioned errors, as these errors
aren't propogated.

Michael Auchter (3):
of: unittest: add test of overlay with devlinks
driver core: add device_links_barrier
of: dynamic: add device links barrier before detach

drivers/base/core.c | 10 ++++++++++
drivers/of/dynamic.c | 3 +++
drivers/of/unittest-data/Makefile | 1 +
drivers/of/unittest-data/overlay_16.dts | 26 +++++++++++++++++++++++++
drivers/of/unittest.c | 16 +++++++++++++++
include/linux/device.h | 1 +
6 files changed, 57 insertions(+)
create mode 100644 drivers/of/unittest-data/overlay_16.dts

--
2.25.4