Re: [PATCH] arm_pmu: acpi: fix reference leak on failed device registration

From: Mark Rutland

Date: Thu Apr 16 2026 - 05:36:33 EST


On Thu, Apr 16, 2026 at 09:23:33AM +0200, Johan Hovold wrote:
> On Thu, Apr 16, 2026 at 06:40:55AM +0200, Greg Kroah-Hartman wrote:
> > On Wed, Apr 15, 2026 at 07:19:06PM +0100, Mark Rutland wrote:
>
> > > AFAICT you're saying that the reference was taken *within*
> > > platform_device_register(), and then platform_device_register() itself
> > > has failed. I think it's surprising that platform_device_register()
> > > doesn't clean that up itself in the case of an error.
> > >
> > > There are *tonnes* of calls to platform_device_register() throughout the
> > > kernel that don't even bother to check the return value, and many that
> > > just pass the return onto a caller that can't possibly know to call
> > > platform_device_put().
> > >
> > > Code in the same file as platform_device_register() expects it to clean up
> > > after itself, e.g.
> > >
> > > | int platform_add_devices(struct platform_device **devs, int num)
> > > | {
> > > | int i, ret = 0;
> > > |
> > > | for (i = 0; i < num; i++) {
> > > | ret = platform_device_register(devs[i]);
> > > | if (ret) {
> > > | while (--i >= 0)
> > > | platform_device_unregister(devs[i]);
> > > | break;
> > > | }
> > > | }
> > > |
> > > | return ret;
> > > | }
> > >
> > > That's been there since the initial git commit, and back then,
> > > platform_device_register() didn't mention that callers needed to perform
> > > any cleanup.
> > >
> > > I see a comment was added to platform_device_register() in commit:
> > >
> > > 67e532a42cf4 ("driver core: platform: document registration-failure requirement")
> > >
> > > ... and that copied the commend added for device_register() in commit:
> > >
> > > 5739411acbaa ("Driver core: Clarify device cleanup.")
> > >
> > > ... but the potential brokenness is so widespread, and the behaviour is
> > > so surprising, that I'd argue the real but is that device_register()
> > > doesn't clean up in case of error. I don't think it's worth changing
> > > this single instance given the prevalance and churn fixing all of that
> > > would involve.
> > >
> > > I think it would be far better to fix the core driver API such that when
> > > those functions return an error, they've already cleaned up for
> > > themselves.
> > >
> > > Greg, am I missing some functional reason why we can't rework
> > > device_register() and friends to handle cleanup themselves? I appreciate
> > > that'll involve churn for some callers, but AFAICT the majority of
> > > callers don't have the required cleanup.
> >
> > Yes, we should fix the platform core code here, this should not be
> > required to do everywhere as obviously we all got it wrong.
>
> It's not just the platform code as this directly reflects the behaviour
> of device_register() as Mark pointed out.
>
> It is indeed an unfortunate quirk of the driver model, but one can argue
> that having a registration function that frees its argument on errors
> would be even worse. And even more so when many (or most) users get this
> right.

Ah, sorry; I had missed that the _put() step would actually free the
object (and as you explain below, how that won't work for many callers).

> So if we want to change this, I think we would need to deprecate
> device_register() in favour of explicit device_initialize() and
> device_add().

Is is possible to have {platfom_,}device_uninitialize() functions that
does everything except the ->release() call? If we had that, then we'd
be able to have a flow along the lines of:

int some_init_function(void)
{
int err;

platform_device_init(&static_pdev);

err = platform_device_add(&static_pdev))
if (err)
goto out_uninit;

return 0;

out_uninit:
platform_device_uninit(&static_pdev);
return err;
}

... which I think would align with what people generally expect to have
to do.

Those would have to check that only a single reference was held (from
the corresponding _initialize()), and could WARN/fail if more were held.

> That said, most users of platform_device_register() appear to operate
> on static platform devices which don't even have a release function and
> would trigger a WARN() if we ever drop the reference (which is arguably
> worse than leaking a tiny bit of memory).
>
> So leaving things as-is is also an option.

I suspect that might be the best option for now.

Mark.