Re: [PATCH v7 2/5] of: change overlay apply input data from unflattened to FDT

From: Frank Rowand
Date: Wed Apr 25 2018 - 14:41:00 EST

On 04/24/18 22:23, Jan Kiszka wrote:
> On 2018-04-24 22:56, Frank Rowand wrote:
>> Hi Alan,
>> On 04/23/18 15:38, Frank Rowand wrote:
>>> Hi Jan,
>>> + Alan Tull for fpga perspective
>>> On 04/22/18 03:30, Jan Kiszka wrote:
>>>> On 2018-04-11 07:42, Jan Kiszka wrote:
>>>>> On 2018-04-05 23:12, Rob Herring wrote:
>>>>>> On Thu, Apr 5, 2018 at 2:28 PM, Frank Rowand <frowand.list@xxxxxxxxx> wrote:
>>>>>>> On 04/05/18 12:13, Jan Kiszka wrote:
>>>>>>>> On 2018-04-05 20:59, Frank Rowand wrote:
>>>>>>>>> Hi Jan,
>>>>>>>>> On 04/04/18 15:35, Jan Kiszka wrote:
>>>>>>>>>> Hi Frank,
>>>>>>>>>> On 2018-03-04 01:17, frowand.list@xxxxxxxxx wrote:
>>>>>>>>>>> From: Frank Rowand <frank.rowand@xxxxxxxx>
>>>>>>>>>>> Move duplicating and unflattening of an overlay flattened devicetree
>>>>>>>>>>> (FDT) into the overlay application code. To accomplish this,
>>>>>>>>>>> of_overlay_apply() is replaced by of_overlay_fdt_apply().
>>>>>>>>>>> The copy of the FDT (aka "duplicate FDT") now belongs to devicetree
>>>>>>>>>>> code, which is thus responsible for freeing the duplicate FDT. The
>>>>>>>>>>> caller of of_overlay_fdt_apply() remains responsible for freeing the
>>>>>>>>>>> original FDT.
>>>>>>>>>>> The unflattened devicetree now belongs to devicetree code, which is
>>>>>>>>>>> thus responsible for freeing the unflattened devicetree.
>>>>>>>>>>> These ownership changes prevent early freeing of the duplicated FDT
>>>>>>>>>>> or the unflattened devicetree, which could result in use after free
>>>>>>>>>>> errors.
>>>>>>>>>>> of_overlay_fdt_apply() is a private function for the anticipated
>>>>>>>>>>> overlay loader.
>>>>>>>>>> We are using of_fdt_unflatten_tree + of_overlay_apply in the
>>>>>>>>>> (out-of-tree) Jailhouse loader driver in order to register a virtual
>>>>>>>>>> device during hypervisor activation with Linux. The DT overlay is
>>>>>>>>>> created from a a template but modified prior to application to account
>>>>>>>>>> for runtime-specific parameters. See [1] for the current implementation.
>>>>>>>>>> I'm now wondering how to model that scenario best with the new API.
>>>>>>>>>> Given that the loader lost ownership of the unflattened tree but the
>>>>>>>>>> modification API exist only for the that DT state, I'm not yet seeing a
>>>>>>>>>> clear solution. Should we apply the template in disabled form (status =
>>>>>>>>>> "disabled"), modify it, and then activate it while it is already applied?
>>>>>>>>> Thank you for the pointer to the driver - that makes it much easier to
>>>>>>>>> understand the use case and consider solutions.
>>>>>>>>> If you can make the changes directly on the FDT instead of on the
>>>>>>>>> expanded devicetree, then you could move to the new API.
>>>>>>>> Are there some examples/references on how to edit FDTs in-place in the
>>>>>>>> kernel? I'd like to avoid writing the n-th FDT parser/generator.
>>>>>>> I don't know of any existing in-kernel edits of the FDT (but they might
>>>>>>> exist). The functions to access an FDT are in libfdt, which is in
>>>>>>> scripts/dtc/libfdt/.
>>>>>> Let's please not go down that route of doing FDT modifications. There
>>>>>> is little reason to other than for early boot changes. And it is much
>>>>>> easier to work on unflattened trees.
>>>>> I just briefly looked into libfdt, and it would have meant building it
>>>>> into the module as there are no library functions exported by the kernel
>>>>> either. Another reason to drop that.
>>>>> What's apparently working now is the pattern I initially suggested:
>>>>> Register template with status = "disabled" as overlay, then prepare and
>>>>> apply changeset that contains all needed modifications and sets the
>>>>> status to "ok". I might be leaking additional resources, but to find
>>>>> that out, I will now finally have to resolve clean unbinding of the
>>>>> generic PCI host controller [1] first.
>>>> static void free_overlay_changeset(struct overlay_changeset *ovcs)
>>>> {
>>>> [...]
>>>> /*
>>>> * TODO
>>>> *
>>>> * would like to: kfree(ovcs->overlay_tree);
>>>> * but can not since drivers may have pointers into this data
>>>> *
>>>> * would like to: kfree(ovcs->fdt);
>>>> * but can not since drivers may have pointers into this data
>>>> */
>>>> kfree(ovcs);
>>>> }
>>>> What's this? I have kmemleak now jumping at me over this. Who is suppose
>>>> to plug these leaks? The caller of of_overlay_fdt_apply has no pointers
>>>> to those objects. I would say that's a regression of the new API.
>>> The problem already existed but it was hidden. We have never been able to
>>> kfree() these object because we do not know if there are any pointers into
>>> these objects. The new API makes the problem visible to kmemleak.
>>> The reason that we do not know if there are any pointers into these objects
>>> is that devicetree access APIs return pointers into the devicetree internal
>>> data structures (that is, into the overlay unflattened devicetree). If we
>>> want to be able to do the kfree()s, we could change the devicetree access
>>> APIs.
>>> The reason that pointers into the overlay flattened tree (ovcs->fdt) are
>>> also exposed is that the overlay unflattened devicetree property values
>>> are pointers into the overlay fdt.
>>> ** This paragraph becomes academic (and not needed) if the fix in the next
>>> paragraph can be implemented. **
>>> I _think_ that the fdt issue __for overlays__ can be fixed somewhat easily.
>>> (I would want to read through the code again to make sure I'm not missing
>>> any issues.) If the of_fdt_unflatten_tree() called by of_overlay_fdt_apply()
>>> was modified so that property values were copied into newly allocated memory
>>> and the live tree property pointers were set to the copy instead of to
>>> the value in the fdt, then I _think_ the fdt could be freed in
>>> of_overlay_fdt_apply() after calling of_overlay_apply(). The code that
>>> frees a devicetree would also have to be aware of this change -- I'm not
>>> sure if that leads to ugly complications or if it is easy. The other
>>> question to consider is whether to make the same change to
>>> of_fdt_unflatten_tree() when it is called in early boot to unflatten
>>> the base devicetree. Doing so would increase the memory usage of the
>>> live tree (we would not be able to free the base fdt after unflattening
>>> it because we make the fdt visible in /sys/firmware/fdt -- though
>>> _maybe_ that could be conditioned on CONFIG_KEXEC).
>> Question added below this paragraph.
>>> But all of the complexity of that fix is _only_ because of_overlay_apply()
>>> and of_overlay_remove() call overlay_notify(), passing in the overlay
>>> unflattened devicetree (which has pointers into the overlay fdt). Pointers
>>> into the overlay unflattened devicetree are then passed to the notifiers.
>>> (Again, I may be missing some other place that the overlay unflattened
>>> devicetree is made visible to other code -- a more thorough reading of
>>> the code is needed.) If the notifiers could be modified to accept the
>>> changeset list instead of of pointers to the fragments in the overlay
>>> unflattened devicetree then there would be no possibility of the notifiers
>>> keeping a pointer into the overlay fdt. I do not know if this is a
>>> practical change for the notifiers -- there are no callers of
>>> of_overlay_notifier_register() in the mainline kernel source. My
>>> recollection is that the overlay notifiers were added for the fpga
>>> subsystem.
>> Can the fpga notifiers be changed to have the changeset as an input
>> instead of having the overlay devicetree fragment and target as an
>> input?
>> The changeset lists nodes and properties to be added, but does not
>> expose any pointers to the overlay fdt or the overlay unflattened
>> devicetree. This guarantees no leakage of pointers into the overlay
>> fdt or the overlay unflattened devicetree. The changeset contains
>> pointers to copies of data, but those copies are never freed (and
>> thus they are yet another existing memory leak).
> Also they are freed, of course: When the last reference to the node they
> point to reaches 0 (e.g. triggered by of_changeset_destroy), that node
> goes away and takes down remaining dead properties. I've ran through
> this already. And I also made sure that my code is not triggering such
> kind of leaks as well.

mea culpa. I go around in circles while trying to remember all the
overlay related issues. I needed to go back and read the code to
refresh my memory. Thanks for the prod to re-read the code.

Yes, of_changeset_destroy() will lead to the kfree() of the node and
it's properties _if_ the node reference count is correct. So what I
said about a memory leak was incorrect in a perfect world (and my
memory was wrong). However, this is not a perfect world and we know
that the reference count on devicetree nodes is often incorrect due
to bugs in common infrastructure and drivers. This issue will not
be resolved until we pull all reference count manipulation into the
devicetree core. The net result is that we should not expect
overlay removal to correctly free all memory that was allocated
when applying the overlay.

I _think_ (but did not spend the time to verify) that there is a small
corner case memory leak even if the reference count on devicetree
nodes is correct. If an overlay adds a property to an existing node
then removing the overlay will not kfree() the property, and it
will remain on the deadprops list. There are some places that
properties are removed from deadprops, but I don't think they fully
resolve the issue. Again, this is a corner case, and I am willing
to document it as a limitation until it gets fixed.

Then returning to me going around in circles... This thread led me to
think that since since the overlay apply code copied data into never
freed memory (false premise, as you pointed out) that we did not
have to worry about drivers retaining pointers into overlay data
after the overlay had been freed (with the one remaining exposure
being via the overlay notifiers, which _might_ be easily resolved,
pending Alan's analysis) -- this would have been great news for
removing an issue for general use of overlays.

But now we are back to the long-standing problem that we have no way
of knowing whether there are any live pointers to the memory that is
freed by of_changeset_destroy(). And I am not aware of any solution
to this problem other than changing the devicetree access API so that
it never returns any pointer into the live devicetree.

The practical impact of all of this, is if we can change the overlay
notifier parameters to include the overlay changeset instead of
the overlay devicetree, then I think that of_overlay_apply() will
be able to kfree() the overlay fdt and overlay devicetree. And
if not of_overlay_apply(), then free_overlay_changeset().

> I agree that object management in dynamic OF code is hairy and easy to
> get wrong, e.g. by passing in static objects that suddenly change
> ownership, and the new owner expects dynamically allocated objects...
> But that is not cured by leaking memory.
> Jan