Re: [PATCH v2] fpga: region: add owner module and take its refcount

From: Xu Yilun
Date: Thu Apr 11 2024 - 05:16:59 EST


On Wed, Apr 10, 2024 at 11:42:23AM +0200, Marco Pagani wrote:
>
>
> On 2024-04-09 06:08, Xu Yilun wrote:
> > On Wed, Apr 03, 2024 at 03:34:22PM +0200, Marco Pagani wrote:
> >>
> >>
> >> On 2024-04-01 11:34, Xu Yilun wrote:
> >>> On Wed, Mar 27, 2024 at 05:00:20PM +0100, Marco Pagani wrote:
> >>>> The current implementation of the fpga region assumes that the low-level
> >>>> module registers a driver for the parent device and uses its owner pointer
> >>>> to take the module's refcount. This approach is problematic since it can
> >>>> lead to a null pointer dereference while attempting to get the region
> >>>> during programming if the parent device does not have a driver.
> >>>>
> >>>> To address this problem, add a module owner pointer to the fpga_region
> >>>> struct and use it to take the module's refcount. Modify the functions for
> >>>> registering a region to take an additional owner module parameter and
> >>>> rename them to avoid conflicts. Use the old function names for helper
> >>>> macros that automatically set the module that registers the region as the
> >>>> owner. This ensures compatibility with existing low-level control modules
> >>>> and reduces the chances of registering a region without setting the owner.
> >>>>
> >>>> Also, update the documentation to keep it consistent with the new interface
> >>>> for registering an fpga region.
> >>>>
> >>>> Other changes: unlock the mutex before calling put_device() in
> >>>> fpga_region_put() to avoid potential use after release issues.
> >>>
> >>> Please try not to mix different changes in one patch, especially for
> >>> a "bug fix" as you said.
> >>
> >> You are right. I'll split out the change and eventually send it as a
> >> separate patch.
> >>
> >>> And I do have concern about the fix, see below.
> >>>
> >>> [...]
> >>>
> >>>> @@ -53,7 +53,7 @@ static struct fpga_region *fpga_region_get(struct fpga_region *region)
> >>>> }
> >>>>
> >>>> get_device(dev);
> >>>> - if (!try_module_get(dev->parent->driver->owner)) {
> >>>> + if (!try_module_get(region->br_owner)) {
> >>>> put_device(dev);
> >>>> mutex_unlock(&region->mutex);
> >>>> return ERR_PTR(-ENODEV);
> >>>> @@ -75,9 +75,9 @@ static void fpga_region_put(struct fpga_region *region)
> >>>>
> >>>> dev_dbg(dev, "put\n");
> >>>>
> >>>> - module_put(dev->parent->driver->owner);
> >>>> - put_device(dev);
> >>>> + module_put(region->br_owner);
> >>>> mutex_unlock(&region->mutex);
> >>>
> >>> If there is concern the region would be freed after put_device(), then
> >>> why still keep the sequence in fpga_region_get()?
> >>
> >> Ouch, sorry, I forgot to make the change also in fpga_region_get().
> >>
> >>> And is it possible region is freed before get_device() in
> >>> fpga_region_get()?
> >>
> >> If the user follows the usual pattern (i.e., waiting for
> >
> > I can see the only safe way is fpga_region_program_fpga() or fpga_region_get()
> > should be included in:
> >
> > region = fpga_region_class_find();
> > ...
> > put_device(&region->dev);
> >
> > That is to say, fpga_region_get() should not be called when there is no
> > region dev reference hold beforehand. In this case, no use after release
> > risk. That's why I was thinking about some documentation.
> >
> > Another concern is we'd better keep the get/put operations symmetrical
> > for easy maintaining, as long as it doesn't cause problem.
>
> Now I see your point. So, you suggest changing only the docs to clarify
> that the region must be taken with fpga_region_class_find() before
> programming it with fpga_region_program_fpga()?

Like:

The reference to the region must already been hold. E.g. by
fpga_region_class_find().

>
> That's fine by me. However, this made me wonder why we need to take the
> region dev with get_device() in fpga_region_program_fpga()->fpga_region_get().
> If we assume that the user must always call fpga_region_class_find()
> before programming with fpga_region_program_fpga(), why do we need the
> double get?

Yeah, I have the same concern when I visit this part. I don't think it
is necessary.

Thanks,
Yilun

>
> Thanks,
> Marco
>
> >> fpga_region_program_fpga() to complete before calling
> >> fpga_region_unregister()) there should be no problem. However, I think
> >> releasing the device before unlocking the mutex contained in the context
> >> associated with the device makes the code brittle and more prone to
> >> problems.
> >>
> >>> Or we should clearly document how/when to use these functions?
> >>
> >> I think it is not necessary to change the documentation since the
> >> in-kernel programming API will not be affected by the change.
> >>
> [...]
>