RE: [EXTERNAL] [PATCH 2/2] PCI: hv: Add support for protocol 1.3 and support PCI_BUS_RELATIONS2

From: Long Li
Date: Mon Dec 02 2019 - 19:50:05 EST


>Subject: RE: [EXTERNAL] [PATCH 2/2] PCI: hv: Add support for protocol 1.3 and
>support PCI_BUS_RELATIONS2
>
>> From: linux-hyperv-owner@xxxxxxxxxxxxxxx
>> Sent: Friday, November 22, 2019 5:57 PM ...
>> @@ -63,6 +63,7 @@
>> enum pci_protocol_version_t {
>> PCI_PROTOCOL_VERSION_1_1 = PCI_MAKE_VERSION(1, 1), /*
>Win10
>> */
>> PCI_PROTOCOL_VERSION_1_2 = PCI_MAKE_VERSION(1, 2), /* RS1
>*/
>> + PCI_PROTOCOL_VERSION_1_3 = PCI_MAKE_VERSION(1, 3), /* VB
>*/
>> };
>
>What is "VB" ? Can we use a more meaningful name here? :-)

Vibranium.

>
>> +struct pci_function_description2 {
>> + u16 v_id; /* vendor ID */
>> + u16 d_id; /* device ID */
>> + u8 rev;
>> + u8 prog_intf;
>> + u8 subclass;
>> + u8 base_class;
>> + u32 subsystem_id;
>> + union win_slot_encoding win_slot;
>
>space -> TAB?
>
>> +/*
>> + * Set NUMA node for the devices on the bus */ static void
>> +pci_assign_numa_node(struct hv_pcibus_device *hbus)
>
>IMO we'd better add a "hv_" prefix to this function's name, otherwise it looks
>more like a generic PCI subsystem API.

I will send v2 to address comments above.

>
>> +{
>> + struct pci_dev *dev;
>> + struct pci_bus *bus = hbus->pci_bus;
>> + struct hv_pci_dev *hv_dev;
>> +
>> + list_for_each_entry(dev, &bus->devices, bus_list) {
>> + hv_dev = get_pcichild_wslot(hbus, devfn_to_wslot(dev-
>>devfn));
>> + if (!hv_dev)
>> + continue;
>> +
>> + if (hv_dev->desc.flags &
>HV_PCI_DEVICE_FLAG_NUMA_AFFINITY)
>> + set_dev_node(&dev->dev, hv_dev-
>>desc.virtual_numa_node);
>> + }
>> +}
>
>Can you please give a brief background introduction to dev->numa_node, e.g.
>how is it used here? -- is it used when a PCI device driver (e.g. the mlx
>driver) asks the kernel to allocate memory for DMA, or allocate MSI interrupts?
>How big is the performance gain in the tests? I'm curious as it looks unclear to
>me how dev->numa_node is used by the PCI subsystem.

numa_node can be used by a device driver (if it's numa aware) to setup its internal data structures and allocate its MSI.

As an example, you can look at "drivers/net/ethernet/mellanox/mlx4/main.c: mlx4_load_one()":
It stores the value at : dev->numa_node = dev_to_node(&pdev->dev);
after that, dev->numa_node is used through driver.

numa_node is also exported though /sys to user-mode, so a NUMA aware application (e.g. MPI) can figure out how to pin itself to selected CPUs for the best latency.

>
>Thanks,
>-- Dexuan