Re: [PATCH V6 01/20] platform/x86/intel/vsec: Fix xa_alloc memory leak

From: Hans de Goede
Date: Mon Dec 04 2023 - 08:52:21 EST


Hi,

On 11/30/23 12:02, Ilpo Järvinen wrote:
> On Wed, 29 Nov 2023, David E. Box wrote:
>
>> Commit 936874b77dd0 ("platform/x86/intel/vsec: Add PCI error recovery
>> support to Intel PMT") added an xarray to track the list of vsec devices to
>> be recovered after a PCI error. But it did not provide cleanup for the list
>> leading to a memory leak that was caught by kmemleak. Do xa_alloc() before
>> devm_add_action_or_reset() so that the list may be cleaned up with
>> xa_erase() in the release function.
>>
>> Fixes: 936874b77dd0 ("platform/x86/intel/vsec: Add PCI error recovery support to Intel PMT")
>> Signed-off-by: David E. Box <david.e.box@xxxxxxxxxxxxxxx>
>> ---
>>
>> V6 - Move xa_alloc() before ida_alloc() to reduce mutex use during error
>> recovery.
>> - Fix return value after id_alloc() fail
>> - Add Fixes tag
>> - Add more detail to changelog
>>
>> V5 - New patch
>>
>> drivers/platform/x86/intel/vsec.c | 24 ++++++++++++++----------
>> drivers/platform/x86/intel/vsec.h | 1 +
>> 2 files changed, 15 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/platform/x86/intel/vsec.c b/drivers/platform/x86/intel/vsec.c
>> index c1f9e4471b28..2d568466b4e2 100644
>> --- a/drivers/platform/x86/intel/vsec.c
>> +++ b/drivers/platform/x86/intel/vsec.c
>> @@ -120,6 +120,8 @@ static void intel_vsec_dev_release(struct device *dev)
>> {
>> struct intel_vsec_device *intel_vsec_dev = dev_to_ivdev(dev);
>>
>> + xa_erase(&auxdev_array, intel_vsec_dev->id);
>> +
>> mutex_lock(&vsec_ida_lock);
>> ida_free(intel_vsec_dev->ida, intel_vsec_dev->auxdev.id);
>> mutex_unlock(&vsec_ida_lock);
>> @@ -135,19 +137,27 @@ int intel_vsec_add_aux(struct pci_dev *pdev, struct device *parent,
>> struct auxiliary_device *auxdev = &intel_vsec_dev->auxdev;
>> int ret, id;
>>
>> - mutex_lock(&vsec_ida_lock);
>> - ret = ida_alloc(intel_vsec_dev->ida, GFP_KERNEL);
>> - mutex_unlock(&vsec_ida_lock);
>> + ret = xa_alloc(&auxdev_array, &intel_vsec_dev->id, intel_vsec_dev,
>> + PMT_XA_LIMIT, GFP_KERNEL);
>> if (ret < 0) {
>> kfree(intel_vsec_dev->resource);
>> kfree(intel_vsec_dev);
>> return ret;
>> }
>>
>> + mutex_lock(&vsec_ida_lock);
>> + id = ida_alloc(intel_vsec_dev->ida, GFP_KERNEL);
>> + mutex_unlock(&vsec_ida_lock);
>> + if (id < 0) {
>> + kfree(intel_vsec_dev->resource);
>> + kfree(intel_vsec_dev);
>> + return id;
>
> Thanks, this looks much better this way around but it is missing
> xa_alloc() rollback now that the order is reversed.
>
> Once that is fixed,
>
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>

I have fixed this up, adding the missing:

xa_erase(&auxdev_array, intel_vsec_dev->id);

to this error-exit path while merging this.

While looking into this I did find one other thing which
worries me (different issue, needs a separate fix):

intel_vsec_pci_slot_reset() uses

devm_release_action(&pdev->dev, intel_vsec_remove_aux,
&intel_vsec_dev->auxdev);

and seems to assume that after this intel_vsec_remove_aux()
has run for the auxdev-s. *But this is not the case*

devm_release_action() only removes the action from the list
of devres resources tied to the parent PCI device.

It does *NOT* call the registered action function,
so intel_vsec_remove_aux() is NOT called here.

And then on re-probing the device as is done in
intel_vsec_pci_slot_reset() a second set of aux
devices with the same parent will be created AFAICT.

So it seems that this also needs an explicit
intel_vsec_remove_aux() call for each auxdev!

###

This makes me wonder if the PCI error handling here
and specifically the reset code was ever tested ?

###

Note that simply forcing a reprobe using device_reprobe()
will cause all the aux-devices to also get removed through
the action on driver-unbind without ever needing
the auxdev_array at all!

I guess that you want the removal to happen before
the pci_restore_state(pdev) state though, so that
simply relying on the removal on driver unbind
is not an option ?

Regards,

Hans