On 4/4/22 12:03 AM, Thomas Gleixner wrote:
On Tue, Mar 29 2022 at 17:47, Ammar Faizi wrote:Thomas,
In mce_threshold_create_device(), if threshold_create_bank() fails, the
@bp will be leaked, because the call to mce_threshold_remove_device()
will not free the @bp. mce_threshold_remove_device() frees
@threshold_banks. At that point, the @bp has not been written to
@threshold_banks, @threshold_banks is NULL, so the call is just a nop.
Fix this by extracting the cleanup part into a new static function
__threshold_remove_device(), then call it from create/remove device
functions.
The way simpler fix is to move
}
this_cpu_write(threshold_banks, bp);
before the loop. That's safe because the banks cannot yet be reached via
an MCE as the vector is not yet enabled:
if (thresholding_irq_en)
mce_threshold_vector = amd_threshold_interrupt;
I did like what you said (in the patch v4), but after Yazen and Borislav
reviewed it, we got a conclusion that it's not safe.
See [1] and [2] for the full message.
[1]: https://lore.kernel.org/lkml/YkFsQhpGGXIFTMyp@xxxxxxx/
[2]: https://lore.kernel.org/lkml/Yh+oyD%2F5M3TW5ZMM@yaz-ubuntu/
Yazen, Borislav, please take a deeper look on this again. I will send
a v7 revision to really make it simpler by moving that "per-CPU var write"
before the loop.