Re: [PATCH v13 016/113] KVM: TDX: x86: Add ioctl to get TDX systemwide parameters

From: Dan Williams
Date: Wed Jan 31 2024 - 01:26:05 EST


Isaku Yamahata wrote:
> On Wed, Mar 29, 2023 at 04:17:22PM -0700,
> Isaku Yamahata <isaku.yamahata@xxxxxxxxx> wrote:
>
> > On Sat, Mar 25, 2023 at 10:43:06AM +0200,
> > Zhi Wang <zhi.wang.linux@xxxxxxxxx> wrote:
> >
> > > On Sun, 12 Mar 2023 10:55:40 -0700
> > > isaku.yamahata@xxxxxxxxx wrote:
> > >
> > > Does this have to be a new generic ioctl with a dedicated new x86_ops? SNP
> > > does not use it at all and all the system-scoped ioctl of SNP going through
> > > the CCP driver. So getting system-scope information of TDX/SNP will end up
> > > differently.
> > >
> > > Any thought, Sean? Moving getting SNP system-wide information to
> > > KVM dev ioctl seems not ideal and TDX does not have a dedicated driver like
> > > CCP. Maybe make this ioctl TDX-specific? KVM_TDX_DEV_OP?
> >
> > We only need global parameters of the TDX module, and we don't interact with TDX
> > module at this point. One alternative is to export those parameters via sysfs.
> > Also the existence of the sysfs node indicates that the TDX module is
> > loaded(initialized?) or not in addition to boot log. Thus we can drop system
> > scope one.
> > What do you think?
> >
> > Regarding to other TDX KVM specific ioctls (KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU,
> > KVM_TDX_INIT_MEM_REGION, and KVM_TDX_FINALIZE_VM), they are specific to KVM. So
> > I don't think it can be split out to independent driver.
>
> Here is the patch to export those info via sysfs.
>
> From e0744e506eb92e47d8317e489945a3ba804edfa7 Mon Sep 17 00:00:00 2001
> Message-Id: <e0744e506eb92e47d8317e489945a3ba804edfa7.1680221730.git.isaku.yamahata@xxxxxxxxx>
> In-Reply-To: <8e0bc0e8e5d435f54f10c7642a862629ef2acb89.1680221729.git.isaku.yamahata@xxxxxxxxx>
> References: <8e0bc0e8e5d435f54f10c7642a862629ef2acb89.1680221729.git.isaku.yamahata@xxxxxxxxx>
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> Date: Thu, 30 Mar 2023 00:05:03 -0700
> Subject: [PATCH] x86/virt/tdx: Export TD config params of TDX module via sysfs
>
> TDX module has parameters for VMM to configure TD. User space VMM, e.g.
> qemu, needs to know it. Export them to user space via sysfs.
>
> TDX 1.0 provides TDH.SYS.INFO to provide system information in
> TDSYSINFO_STRUCT. Its future extensibility is limited because of its
> struct. From TDX 1.5, TDH.SYS.RD(metadata field_id) to read the info
> specified by field id. So instead of exporting TDSYSINFO_STRUCT, adapt
> metadata way to export those system information.

Hi, I came across tdx_sysfs_init() recently and had some comments if
this proposal is going to move forward:

>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> ---
> Documentation/ABI/testing/sysfs-firmware-tdx | 23 +++
> arch/x86/include/asm/tdx.h | 33 ++++
> arch/x86/virt/vmx/tdx/tdx.c | 164 +++++++++++++++++++
> arch/x86/virt/vmx/tdx/tdx.h | 18 ++
> 4 files changed, 238 insertions(+)
> create mode 100644 Documentation/ABI/testing/sysfs-firmware-tdx
>
> diff --git a/Documentation/ABI/testing/sysfs-firmware-tdx b/Documentation/ABI/testing/sysfs-firmware-tdx
> new file mode 100644
> index 000000000000..1f26fb178144
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-firmware-tdx
> @@ -0,0 +1,23 @@
> +What: /sys/firmware/tdx/tdx_module/metadata

The TDX module is not "platform firmware" in comparison to the other EFI
and ACPI inhabitants in /sys/firmware. It is especially not static
platform firmware given it needs to be dynamically activated via KVM
module initialization.

Instead, sysfs already has a location for pure software construct
objects to host a sysfs ABI and that is /sys/bus/virtual. I propose a
common "TSM" class device here [1] and TDX can simply publish a named
attribute group, "host", to extend that class device with TDX specifics.

For cross-vendor consistency "host" is a symlink to the CCP device on
AMD.

[1]: http://lore.kernel.org/r/170660662589.224441.11503798303914595072.stgit@xxxxxxxxxxxxxxxxxxxxxxxxx

> +Date: March 2023
> +KernelVersion: 6.3
> +Contact: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>, kvm@xxxxxxxxxxxxxxx
> +Users: qemu, libvirt
> +Description:
> + The TDX feature requires a firmware that is known as the TDX
> + module. The TDX module exposes its metadata in the following
> + read-only files. The information corresponds to the TDX global
> + metadata specified by 64bit field id.
> + string in lower case. The value is binary.
> + User space VMM like qemu needs refer to them to determine what
> + parameters are needed or allowed to configure guest TDs.
> +
> + ================== ============================================
> + 1900000300000000 ATTRIBUTES_FIXED0
> + 1900000300000001 ATTRIBUTES_FIXED1
> + 1900000300000002 XFAM_FIXED0
> + 1900000300000003 XFAM_FIXED1
> + 9900000100000004 NUM_CPUID_CONFIG
> + 9900000300000400 CPUID_LEAVES
> + 9900000300000500 CPUID_VALUES
> + ================== ============================================

This documentation needs to be per file. With an explanation of how each
file is expected to be used. Someone should reasonably be able to read
this documentation and go write a tool, I don't get that from this
documentation.

> \ No newline at end of file
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 05870e5ed131..c650ac22a916 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -110,6 +110,39 @@ struct tdx_cpuid_config {
> u32 edx;
> } __packed;
>
> +struct tdx_cpuid_config_leaf {
> + u32 leaf;
> + u32 sub_leaf;
> +} __packed;
> +static_assert(offsetof(struct tdx_cpuid_config, leaf) ==
> + offsetof(struct tdx_cpuid_config_leaf, leaf));
> +static_assert(offsetof(struct tdx_cpuid_config, sub_leaf) ==
> + offsetof(struct tdx_cpuid_config_leaf, sub_leaf));
> +static_assert(offsetofend(struct tdx_cpuid_config, sub_leaf) ==
> + sizeof(struct tdx_cpuid_config_leaf));
> +
> +struct tdx_cpuid_config_value {
> + u32 eax;
> + u32 ebx;
> + u32 ecx;
> + u32 edx;
> +} __packed;
> +static_assert(offsetof(struct tdx_cpuid_config, eax) -
> + offsetof(struct tdx_cpuid_config, eax) ==
> + offsetof(struct tdx_cpuid_config_value, eax));
> +static_assert(offsetof(struct tdx_cpuid_config, ebx) -
> + offsetof(struct tdx_cpuid_config, eax) ==
> + offsetof(struct tdx_cpuid_config_value, ebx));
> +static_assert(offsetof(struct tdx_cpuid_config, ecx) -
> + offsetof(struct tdx_cpuid_config, eax) ==
> + offsetof(struct tdx_cpuid_config_value, ecx));
> +static_assert(offsetof(struct tdx_cpuid_config, edx) -
> + offsetof(struct tdx_cpuid_config, eax) ==
> + offsetof(struct tdx_cpuid_config_value, edx));
> +static_assert(offsetofend(struct tdx_cpuid_config, edx) -
> + offsetof(struct tdx_cpuid_config, eax) ==
> + sizeof(struct tdx_cpuid_config_value));
> +
> #define TDSYSINFO_STRUCT_SIZE 1024
> #define TDSYSINFO_STRUCT_ALIGNMENT 1024
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index f9f9c1b76501..56ca520d67d6 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -33,6 +33,12 @@
> #include <asm/tdx.h>
> #include "tdx.h"
>
> +#ifdef CONFIG_SYSFS
> +static int tdx_sysfs_init(void);
> +#else
> +static inline int tdx_sysfs_init(void) { return 0;}
> +#endif
> +
> u32 tdx_global_keyid __ro_after_init;
> EXPORT_SYMBOL_GPL(tdx_global_keyid);
> static u32 tdx_guest_keyid_start __ro_after_init;
> @@ -399,6 +405,10 @@ static int __tdx_get_sysinfo(struct tdsysinfo_struct *sysinfo,
> if (ret)
> return ret;
>
> + ret = tdx_sysfs_init();
> + if (ret)
> + return ret;
> +
> pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",
> sysinfo->attributes, sysinfo->vendor_id,
> sysinfo->major_version, sysinfo->minor_version,
> @@ -1367,3 +1377,157 @@ int tdx_enable(void)
> return ret;
> }
> EXPORT_SYMBOL_GPL(tdx_enable);
> +
> +#ifdef CONFIG_SYSFS
> +
> +static struct kobject *tdx_kobj;
> +static struct kobject *tdx_module_kobj;
> +static struct kobject *tdx_metadata_kobj;
> +
> +#define TDX_METADATA_ATTR(_name, field_id_name, _size) \
> +static struct bin_attribute tdx_metadata_ ## _name = { \
> + .attr = { \
> + .name = field_id_name, \
> + .mode = 0444, \
> + }, \
> + .size = _size, \
> + .read = tdx_metadata_ ## _name ## _show, \
> +}
> +
> +#define TDX_METADATA_ATTR_SHOW(_name, field_id_name) \
> +static ssize_t tdx_metadata_ ## _name ## _show(struct file *filp, struct kobject *kobj, \
> + struct bin_attribute *bin_attr, \
> + char *buf, loff_t offset, size_t count) \
> +{ \
> + struct tdsysinfo_struct *sysinfo = &PADDED_STRUCT(tdsysinfo); \
> + \
> + return memory_read_from_buffer(buf, count, &offset, \
> + &sysinfo->_name, \
> + sizeof(sysinfo->_name)); \
> +} \
> +TDX_METADATA_ATTR(_name, field_id_name, sizeof_field(struct tdsysinfo_struct, _name))
> +
> +TDX_METADATA_ATTR_SHOW(attributes_fixed0, TDX_METADATA_ATTRIBUTES_FIXED0_NAME);
> +TDX_METADATA_ATTR_SHOW(attributes_fixed1, TDX_METADATA_ATTRIBUTES_FIXED1_NAME);
> +TDX_METADATA_ATTR_SHOW(xfam_fixed0, TDX_METADATA_XFAM_FIXED0_NAME);
> +TDX_METADATA_ATTR_SHOW(xfam_fixed1, TDX_METADATA_XFAM_FIXED1_NAME);
> +
> +static ssize_t tdx_metadata_num_cpuid_config_show(struct file *filp, struct kobject *kobj,
> + struct bin_attribute *bin_attr,
> + char *buf, loff_t offset, size_t count)
> +{
> + struct tdsysinfo_struct *sysinfo = &PADDED_STRUCT(tdsysinfo);
> + /*
> + * Although tdsysinfo_struct.num_cpuid_config is defined as u32 for
> + * alignment, TDX 1.5 defines metadata NUM_CONFIG_CPUID as u16.
> + */
> + u16 tmp = (u16)sysinfo->num_cpuid_config;
> +
> + WARN_ON_ONCE(tmp != sysinfo->num_cpuid_config);

Why crash the kernel here?

> + return memory_read_from_buffer(buf, count, &offset, &tmp, sizeof(tmp));
> +}
> +TDX_METADATA_ATTR(num_cpuid_config, TDX_METADATA_NUM_CPUID_CONFIG_NAME, sizeof(u16));
> +
> +static ssize_t tdx_metadata_cpuid_leaves_show(struct file *filp, struct kobject *kobj,
> + struct bin_attribute *bin_attr, char *buf,
> + loff_t offset, size_t count)
> +{
> + struct tdsysinfo_struct *sysinfo = &PADDED_STRUCT(tdsysinfo);
> + ssize_t r;
> + struct tdx_cpuid_config_leaf *tmp;
> + u32 i;
> +
> + tmp = kmalloc(bin_attr->size, GFP_KERNEL);
> + if (!tmp)
> + return -ENOMEM;

Why is this allocating and then blindly copying bin_attr->size into
@buf? It it either knows that @buf is big enough, no need to allocate,
or if it does not know if @buf is big enough then the copy into @tmp
offers no protection.

> +
> + for (i = 0; i < sysinfo->num_cpuid_config; i++) {
> + struct tdx_cpuid_config *c = &sysinfo->cpuid_configs[i];
> + struct tdx_cpuid_config_leaf *leaf = (struct tdx_cpuid_config_leaf *)c;
> +
> + memcpy(tmp + i, leaf, sizeof(*leaf));
> + }
> +
> + r = memory_read_from_buffer(buf, count, &offset, tmp, bin_attr->size);
> + kfree(tmp);
> + return r;
> +}
> +
> +TDX_METADATA_ATTR(cpuid_leaves, TDX_METADATA_CPUID_LEAVES_NAME, 0);
> +
> +static ssize_t tdx_metadata_cpuid_values_show(struct file *filp, struct kobject *kobj,
> + struct bin_attribute *bin_attr, char *buf,
> + loff_t offset, size_t count)
> +{
> + struct tdsysinfo_struct *sysinfo = &PADDED_STRUCT(tdsysinfo);
> + struct tdx_cpuid_config_value *tmp;
> + ssize_t r;
> + u32 i;
> +
> + tmp = kmalloc(bin_attr->size, GFP_KERNEL);
> + if (!tmp)
> + return -ENOMEM;
> +
> + for (i = 0; i < sysinfo->num_cpuid_config; i++) {
> + struct tdx_cpuid_config *c = &sysinfo->cpuid_configs[i];
> + struct tdx_cpuid_config_value *value = (struct tdx_cpuid_config_value *)&c->eax;
> +
> + memcpy(tmp + i, value, sizeof(*value));
> + }
> +
> + r = memory_read_from_buffer(buf, count, &offset, tmp, bin_attr->size);
> + kfree(tmp);
> + return r;
> +}
> +
> +TDX_METADATA_ATTR(cpuid_values, TDX_METADATA_CPUID_VALUES_NAME, 0);
> +
> +static struct bin_attribute *tdx_metadata_attrs[] = {
> + &tdx_metadata_attributes_fixed0,
> + &tdx_metadata_attributes_fixed1,
> + &tdx_metadata_xfam_fixed0,
> + &tdx_metadata_xfam_fixed1,
> + &tdx_metadata_num_cpuid_config,
> + &tdx_metadata_cpuid_leaves,
> + &tdx_metadata_cpuid_values,
> + NULL,
> +};
> +
> +static const struct attribute_group tdx_metadata_attr_group = {
> + .bin_attrs = tdx_metadata_attrs,
> +};
> +
> +static int tdx_sysfs_init(void)
> +{
> + struct tdsysinfo_struct *sysinfo;
> + int ret;
> +
> + tdx_kobj = kobject_create_and_add("tdx", firmware_kobj);
> + if (!tdx_kobj) {
> + pr_err("kobject_create_and_add tdx failed\n");
> + return -EINVAL;
> + }

Subsystems, PCI for example [2], are slowly unwinding their usage of dynamic
sysfs_create_*() APIs in favor of static attribute groups. Dynamic
kobject_create_*() usage is even more of an anti-pattern for new code.

This goes away with static attribute group registration.

[2]: https://lore.kernel.org/linux-pci/20231019200110.GA1410324@bhelgaas/