Re: [PATCH v5 8/8] platform/x86/amd/hsmp: Make metric table read locking use guard(mutex)

From: Ilpo Järvinen

Date: Thu Jun 11 2026 - 09:18:19 EST


On Thu, 11 Jun 2026, Muralidhara M K wrote:

> hsmp_metric_tbl_read() refreshes the SMU-side metric table and then
> memcpy_fromio()'s the result. Without serialization, two parallel
> readers can interleave the refresh and the copy and the caller
> observes a torn (mixed old/new) snapshot. Add a per-socket
> metric_tbl_lock so the refresh-and-copy sequence is atomic from
> userspace's point of view.
>
> Use scoped guard(mutex) so the lock is released on every return
> path without hand-written goto chains, and initialize the mutex
> with devm_mutex_init() so no explicit mutex_destroy() cleanup is
> required.
>
> Initialize the mutex before devm_ioremap() so the invariant
> "sock->metric_tbl_addr != NULL implies metric_tbl_lock is usable"
> holds on every error exit. Both callers of hsmp_get_tbl_dram_base()
> (init_acpi() and init_platform_device()) intentionally only log a
> failure and continue probing, so initializing the mutex after a
> successful ioremap would leave sock->metric_tbl_addr populated with
> an uninitialized lock, and the next hsmp_metric_tbl_read() would
> take guard(mutex)() on garbage memory. With the order swapped, a
> devm_mutex_init() failure returns early before metric_tbl_addr is
> ever set, and the existing NULL check in hsmp_metric_tbl_read()
> keeps rejecting the read with -ENOMEM as before.
>
> Reviewed-by: Muthusamy Ramalingam <muthusamy.ramalingam@xxxxxxx>
> Signed-off-by: Muralidhara M K <muralidhara.mk@xxxxxxx>
> ---
> drivers/platform/x86/amd/hsmp/hsmp.c | 19 +++++++++++++++++++
> drivers/platform/x86/amd/hsmp/hsmp.h | 3 +++
> 2 files changed, 22 insertions(+)
>
> diff --git a/drivers/platform/x86/amd/hsmp/hsmp.c b/drivers/platform/x86/amd/hsmp/hsmp.c
> index a9dca97568b8..46e8dc7cfb60 100644
> --- a/drivers/platform/x86/amd/hsmp/hsmp.c
> +++ b/drivers/platform/x86/amd/hsmp/hsmp.c
> @@ -479,6 +479,7 @@ ssize_t hsmp_metric_tbl_read(struct hsmp_socket *sock, char *buf, size_t size)
> msg.msg_id = HSMP_GET_METRIC_TABLE;
> msg.sock_ind = sock->sock_ind;
>
> + guard(mutex)(&sock->metric_tbl_lock);
> ret = hsmp_send_message(&msg);
> if (ret)
> return ret;
> @@ -495,6 +496,24 @@ int hsmp_get_tbl_dram_base(u16 sock_ind)
> phys_addr_t dram_addr;
> int ret;
>
> + /*
> + * Initialize the per-socket lock before anything that can set
> + * sock->metric_tbl_addr to a non-NULL value. hsmp_metric_tbl_read()
> + * gates on sock->metric_tbl_addr being non-NULL and then takes
> + * metric_tbl_lock unconditionally; both callers of this function
> + * (init_acpi() and init_platform_device()) intentionally only log
> + * a failure here and continue probing, so an init order that left
> + * metric_tbl_addr populated while devm_mutex_init() failed would
> + * leave the read path locking an uninitialized mutex. Doing the
> + * mutex init first preserves the invariant "metric_tbl_addr !=
> + * NULL implies the lock is usable" on every error exit.
> + */
> + ret = devm_mutex_init(sock->dev, &sock->metric_tbl_lock);
> + if (ret) {
> + dev_err(sock->dev, "Failed to initialize metric table lock\n");
> + return ret;
> + }

Sashiko flags a concurrency problem here.

This fundamentally stems from earlier design decisions:

1) hsmp_acpi_probe() is not really doing any concurrency control for
.is_probed access. I somehow seem to recall I did brought this up earlier
with somebody else working with this driver earlier but apparently there
still are not locks or other concurrency control in the probe.
I don't remember anymore what happened with it back then.

(The problem #1 is not exactly mentioned by sashiko but it's there,
AFAICT, nothing guarantees only one probe sees !hsmp_pdev->is_probed and
assigns to ->sock.)

2) ->sock teardown being bound to which ever socket allocated ->sock.
Leading to use-after-free in devm_ teardown for any remove that runs after
it.

I think the early teardown of the misc device was the only thing that
initially prevented use-after-frees. As it kind of worked, I never voiced
my concerns about how fragile the teardown was. Looking through the
history now, it seems things got broken after adding hwmon code which does
use devm and calls hsmp_send_message(). As a result, removing this driver
is currently broken. This patch adds to the problem.


I don't think is_probed is good solution here but the release of ->sock
should be properly reference counted and that might be reusable for the
alloc side.

> msg.sock_ind = sock_ind;
> msg.response_sz = hsmp_msg_desc_table[HSMP_GET_METRIC_TABLE_DRAM_ADDR].response_sz;
> msg.msg_id = HSMP_GET_METRIC_TABLE_DRAM_ADDR;
> diff --git a/drivers/platform/x86/amd/hsmp/hsmp.h b/drivers/platform/x86/amd/hsmp/hsmp.h
> index e7f051475728..f7b1cbf19932 100644
> --- a/drivers/platform/x86/amd/hsmp/hsmp.h
> +++ b/drivers/platform/x86/amd/hsmp/hsmp.h
> @@ -15,6 +15,7 @@
> #include <linux/hwmon.h>
> #include <linux/kconfig.h>
> #include <linux/miscdevice.h>
> +#include <linux/mutex.h>
> #include <linux/pci.h>
> #include <linux/semaphore.h>
> #include <linux/sysfs.h>
> @@ -41,6 +42,8 @@ struct hsmp_socket {
> struct bin_attribute hsmp_attr;
> struct hsmp_mbaddr_info mbinfo;
> void __iomem *metric_tbl_addr;
> + /* Serializes concurrent metric table refreshes from the sysfs path */
> + struct mutex metric_tbl_lock;
> void __iomem *virt_base_addr;
> struct semaphore hsmp_sem;
> char name[HSMP_ATTR_GRP_NAME_SIZE];
>

--
i.