Re: [PATCH] arm_pmu: acpi: fix reference leak on failed device registration
From: Guangshuo Li
Date: Thu Apr 16 2026 - 02:35:02 EST
Hi Mark, Greg,
Thanks for the feedback.
On Thu, 16 Apr 2026 at 12:41, Greg Kroah-Hartman
<gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Apr 15, 2026 at 07:19:06PM +0100, Mark Rutland wrote:
> > Hi,
> >
> > Thanks for the patch, but from a quick skim, I don't think this is the right
> > fix.
> >
> > Greg, I think we might want to rework the core API here; question for
> > you at the end.
> >
> > On Thu, Apr 16, 2026 at 01:41:59AM +0800, Guangshuo Li wrote:
> > > When platform_device_register() fails in arm_acpi_register_pmu_device(),
> > > the embedded struct device in pdev has already been initialized by
> > > device_initialize(), but the failure path only unregisters the GSI and
> > > does not drop the device reference for the current platform device:
> > >
> > > arm_acpi_register_pmu_device()
> > > -> platform_device_register(pdev)
> > > -> device_initialize(&pdev->dev)
> > > -> setup_pdev_dma_masks(pdev)
> > > -> platform_device_add(pdev)
> > >
> > > This leads to a reference leak when platform_device_register() fails.
> >
> > AFAICT you're saying that the reference was taken *within*
> > platform_device_register(), and then platform_device_register() itself
> > has failed. I think it's surprising that platform_device_register()
> > doesn't clean that up itself in the case of an error.
> >
> > There are *tonnes* of calls to platform_device_register() throughout the
> > kernel that don't even bother to check the return value, and many that
> > just pass the return onto a caller that can't possibly know to call
> > platform_device_put().
> >
> > Code in the same file as platform_device_register() expects it to clean up
> > after itself, e.g.
> >
> > | int platform_add_devices(struct platform_device **devs, int num)
> > | {
> > | int i, ret = 0;
> > |
> > | for (i = 0; i < num; i++) {
> > | ret = platform_device_register(devs[i]);
> > | if (ret) {
> > | while (--i >= 0)
> > | platform_device_unregister(devs[i]);
> > | break;
> > | }
> > | }
> > |
> > | return ret;
> > | }
> >
> > That's been there since the initial git commit, and back then,
> > platform_device_register() didn't mention that callers needed to perform
> > any cleanup.
> >
> > I see a comment was added to platform_device_register() in commit:
> >
> > 67e532a42cf4 ("driver core: platform: document registration-failure requirement")
> >
> > ... and that copied the commend added for device_register() in commit:
> >
> > 5739411acbaa ("Driver core: Clarify device cleanup.")
> >
> > ... but the potential brokenness is so widespread, and the behaviour is
> > so surprising, that I'd argue the real but is that device_register()
> > doesn't clean up in case of error. I don't think it's worth changing
> > this single instance given the prevalance and churn fixing all of that
> > would involve.
> >
> > I think it would be far better to fix the core driver API such that when
> > those functions return an error, they've already cleaned up for
> > themselves.
> >
> > Greg, am I missing some functional reason why we can't rework
> > device_register() and friends to handle cleanup themselves? I appreciate
> > that'll involve churn for some callers, but AFAICT the majority of
> > callers don't have the required cleanup.
>
> Yes, we should fix the platform core code here, this should not be
> required to do everywhere as obviously we all got it wrong.
>
> Guangshuo, can you submit a patch to do that instead and ask for all of
> your other patches to not be applied as well?
>
> thanks,
>
> greg k-h
I agree that fixing this in the platform core makes more sense than
handling it in individual callers.
I'll look into the core code and send a patch for that instead. I'll
also ask for my other related patches not to be applied.
Thanks,
Guangshuo