Re: [PATCH v4 2/3] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

From: Nicolin Chen
Date: Fri Oct 09 2020 - 12:01:22 EST


On Fri, Oct 09, 2020 at 02:25:56PM +0200, Thierry Reding wrote:
> On Thu, Oct 08, 2020 at 02:12:10PM -0700, Nicolin Chen wrote:
> > On Thu, Oct 08, 2020 at 11:53:43AM +0200, Thierry Reding wrote:
> > > On Mon, Oct 05, 2020 at 06:05:46PM -0700, Nicolin Chen wrote:
> > > > On Mon, Oct 05, 2020 at 11:57:54AM +0200, Thierry Reding wrote:
> > > > > On Fri, Oct 02, 2020 at 11:58:29AM -0700, Nicolin Chen wrote:
> > > > > > On Fri, Oct 02, 2020 at 06:02:18PM +0300, Dmitry Osipenko wrote:
> > > > > > > 02.10.2020 09:08, Nicolin Chen пишет:
> > > > > > > > static int tegra_smmu_of_xlate(struct device *dev,
> > > > > > > > struct of_phandle_args *args)
> > > > > > > > {
> > > > > > > > + struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
> > > > > > > > + struct tegra_mc *mc = platform_get_drvdata(iommu_pdev);
> > > > > > > > u32 id = args->args[0];
> > > > > > > >
> > > > > > > > + of_node_put(args->np);
> > > > > > >
> > > > > > > of_find_device_by_node() takes device reference and not the np
> > > > > > > reference. This is a bug, please remove of_node_put().
> > > > > >
> > > > > > Looks like so. Replacing it with put_device(&iommu_pdev->dev);
> > > > >
> > > > > Putting the put_device() here is wrong, though. You need to make sure
> > > > > you keep a reference to it as long as you keep accessing the data that
> > > > > is owned by it.
> > > >
> > > > I am confused. You said in the other reply (to Dmitry) that we do
> > > > need to put_device(mc->dev), where mc->dev should be the same as
> > > > iommu_pdev->dev. But here your comments sounds that we should not
> > > > put_device at all since ->probe_device/group_device/attach_dev()
> > > > will use it later.
> > >
> > > You need to call put_device() at some point to release the reference
> > > that you acquired by calling of_find_device_by_node(). If you don't
> > > release it, you're leaking the reference and the kernel isn't going to
> > > know when it's safe to delete the device.
> > >
> > > So what I'm saying is that we either release it here, which isn't quite
> > > right because we do reference data relating to the device later on. And
> >
> > I see. A small question here by the way: By looking at other IOMMU
> > drivers that are calling driver_find_device_by_fwnode() function,
> > I found that most of them put_device right after the function call,
> > and dev_get_drvdata() after putting the device..
> >
> > Feels like they are doing it wrongly?
>
> Well, like I said this is somewhat academic because these are all
> referencing the IOMMU that by definition still needs to be around
> when this code is called, and there's locks in place to ensure
> these don't go away. So it's not like these drivers are doing it
> wrong, they're just not doing it pedantically right.
>
> >
> > > because it isn't quite right there should be a reason to justify it,
> > > which is that the SMMU parent device is the same as the MC, so the
> > > reference count isn't strictly necessary. But that's not quite obvious,
> > > so highlighting it in a comment makes sense.
> > >
> > > The other alternative is to not call put_device() here and keep on to
> > > the reference as long as you keep using "mc". This might be difficult to
> > > implement because it may not be obvious where to release it. I think
> > > this is the better alternative, but if it's too complicated to implement
> > > it might not be worth it.
> >
> > I feel so too. The dev is got at of_xlate() that does not have an
> > obvious counterpart function. So I'll just remove put_device() and
> > put a line of comments, as you suggested.
>
> I think you misunderstood. Not calling put_device() would be wrong
> because that leaks a reference to the SMMU that you can't get back. My
> suggestion was rather to keep put_device() here, but add a comment as to
> why it's okay to call the put_device() here, even though you keep using
> its private data later beyond this point, which typically would be wrong
> to do.

I see. Thanks for clarification! Will send v6 soon.