Re: [PATCH] arm_pmu: acpi: fix reference leak on failed device registration
From: Guangshuo Li
Date: Thu Apr 16 2026 - 05:04:29 EST
Hi Greg, Mark, Johan,
Thanks for the further comments.
On Thu, 16 Apr 2026 at 15:23, Johan Hovold <johan@xxxxxxxxxx> wrote:
>
> On Thu, Apr 16, 2026 at 06:40:55AM +0200, Greg Kroah-Hartman wrote:
> > On Wed, Apr 15, 2026 at 07:19:06PM +0100, Mark Rutland wrote:
>
> > > AFAICT you're saying that the reference was taken *within*
> > > platform_device_register(), and then platform_device_register() itself
> > > has failed. I think it's surprising that platform_device_register()
> > > doesn't clean that up itself in the case of an error.
> > >
> > > There are *tonnes* of calls to platform_device_register() throughout the
> > > kernel that don't even bother to check the return value, and many that
> > > just pass the return onto a caller that can't possibly know to call
> > > platform_device_put().
> > >
> > > Code in the same file as platform_device_register() expects it to clean up
> > > after itself, e.g.
> > >
> > > | int platform_add_devices(struct platform_device **devs, int num)
> > > | {
> > > | int i, ret = 0;
> > > |
> > > | for (i = 0; i < num; i++) {
> > > | ret = platform_device_register(devs[i]);
> > > | if (ret) {
> > > | while (--i >= 0)
> > > | platform_device_unregister(devs[i]);
> > > | break;
> > > | }
> > > | }
> > > |
> > > | return ret;
> > > | }
> > >
> > > That's been there since the initial git commit, and back then,
> > > platform_device_register() didn't mention that callers needed to perform
> > > any cleanup.
> > >
> > > I see a comment was added to platform_device_register() in commit:
> > >
> > > 67e532a42cf4 ("driver core: platform: document registration-failure requirement")
> > >
> > > ... and that copied the commend added for device_register() in commit:
> > >
> > > 5739411acbaa ("Driver core: Clarify device cleanup.")
> > >
> > > ... but the potential brokenness is so widespread, and the behaviour is
> > > so surprising, that I'd argue the real but is that device_register()
> > > doesn't clean up in case of error. I don't think it's worth changing
> > > this single instance given the prevalance and churn fixing all of that
> > > would involve.
> > >
> > > I think it would be far better to fix the core driver API such that when
> > > those functions return an error, they've already cleaned up for
> > > themselves.
> > >
> > > Greg, am I missing some functional reason why we can't rework
> > > device_register() and friends to handle cleanup themselves? I appreciate
> > > that'll involve churn for some callers, but AFAICT the majority of
> > > callers don't have the required cleanup.
> >
> > Yes, we should fix the platform core code here, this should not be
> > required to do everywhere as obviously we all got it wrong.
>
> It's not just the platform code as this directly reflects the behaviour
> of device_register() as Mark pointed out.
>
> It is indeed an unfortunate quirk of the driver model, but one can argue
> that having a registration function that frees its argument on errors
> would be even worse. And even more so when many (or most) users get this
> right.
>
> So if we want to change this, I think we would need to deprecate
> device_register() in favour of explicit device_initialize() and
> device_add().
>
> That said, most users of platform_device_register() appear to operate
> on static platform devices which don't even have a release function and
> would trigger a WARN() if we ever drop the reference (which is arguably
> worse than leaking a tiny bit of memory).
>
> So leaving things as-is is also an option.
>
> Johan
I did some more investigation, and it looks like directly changing the
semantics of the existing API would break code that is already correct
today.
In particular, there seem to be at least two different kinds of callers:
Callers that already handle the failure path explicitly after
platform_device_register() fails. For these users, changing
platform_device_register() itself to drop the reference internally
would lead to double put / use-after-free issues.
Callers that operate on static struct platform_device objects. Many of
these do not have a release callback, so blindly dropping the
reference on failure would trigger a WARN.
Because of this, changing platform_device_register() itself to always
clean up on failure does not look safe.
One possible direction may be to leave platform_device_register()
unchanged, and instead add new helper APIs for the different cases.
For case (1), I was thinking of a helper like:
platform_device_register_and_put()
The implementation would simply call platform_device_register(), and if
that fails, call platform_device_put(). Callers converted to this helper
would then no longer perform their own put on the failure path.
For case (2), I was thinking of a helper like:
platform_device_register_static()
The implementation would first install a no-op release callback when
pdev->dev.release is not set, and then call
platform_device_register_and_put(). This would make the failure path
well-defined for static platform_device users, avoiding the reference
leak without triggering a WARN.
If this direction sounds reasonable, I would be happy to work on it and
send a patch, and I would also be very willing to help with the related
API conversion work for existing callers.
Thanks,
Guangshuo