Re: [PATCH v6 2/2] x86/MCE/AMD: Fix memory leak when `threshold_create_bank()` fails

From: Ammar Faizi
Date: Sun Apr 03 2022 - 13:44:19 EST


On 4/4/22 12:03 AM, Thomas Gleixner wrote:
On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote:

In mce_threshold_create_device(), if threshold_create_bank() fails, the
@bp will be leaked, because the call to mce_threshold_remove_device()
will not free the @bp. mce_threshold_remove_device() frees
@threshold_banks. At that point, the @bp has not been written to
@threshold_banks, @threshold_banks is NULL, so the call is just a nop.

Fix this by extracting the cleanup part into a new static function
__threshold_remove_device(), then call it from create/remove device
functions.

The way simpler fix is to move

}
this_cpu_write(threshold_banks, bp);

before the loop. That's safe because the banks cannot yet be reached via
an MCE as the vector is not yet enabled:
if (thresholding_irq_en)
mce_threshold_vector = amd_threshold_interrupt;
Thomas,

I did like what you said (in the patch v4), but after Yazen and Borislav
reviewed it, we got a conclusion that it's not safe.

See [1] and [2] for the full message.

[1]: https://lore.kernel.org/lkml/YkFsQhpGGXIFTMyp@xxxxxxx/
[2]: https://lore.kernel.org/lkml/Yh+oyD%2F5M3TW5ZMM@yaz-ubuntu/

Yazen, Borislav, please take a deeper look on this again. I will send
a v7 revision to really make it simpler by moving that "per-CPU var write"
before the loop.

Thanks!

--
Ammar Faizi