Re: [PATCH 02/12] uapi: habanalabs: add gaudi2 defines

From: Oded Gabbay
Date: Tue Jun 28 2022 - 04:21:09 EST


On Tue, Jun 28, 2022 at 9:33 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Jun 27, 2022 at 11:26:10PM +0300, Oded Gabbay wrote:
> > @@ -456,7 +841,7 @@ struct hl_info_hw_ip_info {
> > __u32 num_of_events;
> > __u32 device_id;
> > __u32 module_id;
> > - __u32 reserved;
> > + __u32 decoder_enabled_mask;
> > __u16 first_available_interrupt_id;
> > __u16 server_type;
> > __u32 cpld_version;
> > @@ -466,12 +851,13 @@ struct hl_info_hw_ip_info {
> > __u32 psoc_pci_pll_div_factor;
> > __u8 tpc_enabled_mask;
> > __u8 dram_enabled;
> > - __u8 pad[2];
> > + __u8 reserved;
> > + __u8 mme_master_slave_mode;
>
> You are moving fields around (reserved moved to a different spot, and
> you were checking that reserved was always 0, right?) and renaming them.
> Is that going to break userspace?

No, I don't think it will break userspace. Those variables are
replaced with exactly the same field type (u32 -> u32 , u8 pad[2] ->
two u8 variables).

Anyway, this structure is only for output to the userspace. We just
get a pointer from userspace for a location to copy_to_user this
information. So I don't care what was in that location in the first
place.

And yes, reserved fields were always initialized to 0 by the driver.

>
>
>
>
> > __u8 cpucp_version[HL_INFO_VERSION_MAX_LEN];
> > __u8 card_name[HL_INFO_CARD_NAME_MAX_LEN];
> > - __u64 reserved2;
> > + __u64 tpc_enabled_mask_ext;
> > __u64 dram_page_size;
> > - __u32 reserved3;
> > + __u32 edma_enabled_mask;
> > __u16 number_of_user_interrupts;
> > __u16 pad2;
> > __u64 reserved4;
> > @@ -722,6 +1108,44 @@ struct hl_info_dev_memalloc_page_sizes {
> > __u64 page_order_bitmask;
> > };
> >
> > +#define HL_TPM_PCR_DATA_BUF_SZ 256
> > +#define HL_TPM_PCR_QUOTE_BUF_SZ 510 /* (512 - 2) 2 bytes used for size */
> > +#define HL_TPM_SIGNATURE_BUF_SZ 255 /* (256 - 1) 1 byte used for size */
> > +#define HL_TPM_PUB_DATA_BUF_SZ 510 /* (512 - 2) 2 bytes used for size */
> > +#define HL_TPM_CERTIFICATE_BUF_SZ 2046 /* (2048 - 2) 2 bytes used for size */
> > +
> > +/**
> > + * struct hl_info_tpm - attestation data of the boot from the TPM
> > + * @nonce: number only used once. random number provided by host. this also passed to the quote
> > + * command as a qualifying data.
> > + * @pcr_quote_len: length of the attestation quote data in bytes
> > + * @pub_data_len: length of the public data in bytes
> > + * @certificate_len: length of the certificate in bytes
> > + * @pcr_num_reg: number of PCR registers in the pcr_data array
> > + * @pcr_reg_len: length of each PCR register in the pcr_data array in bytes
> > + * @quote_sig_len: length of the attestation signature in bytes
> > + * @pcr_data: raw values of the PCR registers from the TPM
> > + * @pcr_quote: attestation data structure (TPM2B_ATTEST) from the TPM
> > + * @public_data: public key and certificate info from the TPM (outPublic + name + qualifiedName)
> > + * @certificate: certificate for the attestation data, read from the TPM NV mem
> > + * @quote_sig: signature structure (TPMT_SIGNATURE) of the attestation data
> > + */
> > +struct hl_info_tpm {
> > + __u32 nonce;
> > + __u16 pcr_quote_len;
> > + __u16 pub_data_len;
> > + __u16 certificate_len;
> > + __u8 pcr_num_reg;
> > + __u8 pcr_reg_len;
> > + __u8 quote_sig_len;
> > + __u8 pcr_data[HL_TPM_PCR_DATA_BUF_SZ];
> > + __u8 pcr_quote[HL_TPM_PCR_QUOTE_BUF_SZ];
> > + __u8 public_data[HL_TPM_PUB_DATA_BUF_SZ];
> > + __u8 certificate[HL_TPM_CERTIFICATE_BUF_SZ];
> > + __u8 quote_sig[HL_TPM_SIGNATURE_BUF_SZ];
> > + __u8 pad0[2];
>
> Do you always check that pad0 is set to 0 in the kernel code?
Yes, we always cleared it. This structure is always generated by the
kernel driver and consumed by the user code. Never vice-versa.

>
> > +};
> > +
> > enum gaudi_dcores {
> > HL_GAUDI_WS_DCORE,
> > HL_GAUDI_WN_DCORE,
> > @@ -742,6 +1166,7 @@ enum gaudi_dcores {
> > * @period_ms: Period value, in milliseconds, for utilization rate in range 100ms - 1000ms in 100 ms
> > * resolution. Currently not in use.
> > * @pll_index: Index as defined in hl_<asic type>_pll_index enumeration.
> > + * @tpm_nonce: Nonce number used for tpm attestation.
> > * @eventfd: event file descriptor for event notifications.
> > * @pad: Padding to 64 bit.
> > */
> > @@ -755,6 +1180,7 @@ struct hl_info_args {
> > __u32 ctx_id;
> > __u32 period_ms;
> > __u32 pll_index;
> > + __u32 tpm_nonce;
> > __u32 eventfd;
> > };
>
> You added a new field to the middle of a structure, did that just break
> userspace built with the old structure from working with a new kernel?
It's a union, not a structure.

>
> > @@ -1400,7 +1837,16 @@ struct hl_debug_params_bmon {
> >
> > /* Trace source ID */
> > __u32 id;
> > - __u32 pad;
> > +
> > + /* Control register */
> > + __u32 control;
> > +
> > + /* Two more address ranges that the user can request to filter */
> > + __u64 start_addr2;
> > + __u64 end_addr2;
> > +
> > + __u64 start_addr3;
> > + __u64 end_addr3;
> > };
>
> Adding to the end is good, but again, will old userspace and new kernel
> still work properly?
Yes. We get from the userspace the pointer to copy-to-user this
structure into, and the maximum size to copy.
In the driver we check the maximum size the user gave us vs. the
sizeof(structure).
We copy the minimum between the two sizes, so if we work vs. old
userspace which has an old version of this structure, we will copy the
amount of bytes that represent his version of the structure.

>
> >
> > struct hl_debug_params_spmu {
> > @@ -1409,7 +1855,11 @@ struct hl_debug_params_spmu {
> >
> > /* Number of event types selection */
> > __u32 event_types_num;
> > - __u32 pad;
> > +
> > + /* TRC configuration register values */
> > + __u32 pmtrc_val;
> > + __u32 trc_ctrl_host_val;
> > + __u32 trc_en_host_val;
> > };
>
> Same here, you increased the size.
Same answer.
Thanks,
Oded
>
> thanks,
>
> greg k-h