RE: [RFC v1 1/5] PCI: hv: Create and export hv_build_logical_dev_id()

From: Michael Kelley

Date: Tue Apr 14 2026 - 14:12:16 EST


From: Easwar Hariharan <easwar.hariharan@xxxxxxxxxxxxxxxxxxx> Sent: Tuesday, April 14, 2026 10:42 AM
>

[snip]

> >> Thanks for that explanation, that makes sense. I didn't see any serialization
> >> that would ensure that the VMBus path to communicate the child devices on the bus
> >> would complete before pci_scan_device() finds and finalizes the pci_dev. I think it's
> >
> > FWIW, hv_pci_query_relations() should be ensuring that the communication
> > has completed before it returns. It does a wait_for_reponse(), which ensures
> > that the Hyper-V host has sent the PCI_BUS_RELATIONS[2] response. However,
> > that message spins off work to the hbus->wq workqueue, so
> > hv_pci_query_relations() has a flush_workqueue() so ensure everything that
> > was queued has completed.
>
> Hm, I read the comment for the flush_workqueue() as addressing the "PCI_BUS_RELATIONS[2]
> message arrived before we sent the QUERY_BUS_RELATIONS message" race case, not as an
> "all child devices have definitely been received and processed in response to our
> QUERY_BUS_RELATIONS message". Also, knowing very little about the VMBus contract, I
> discounted the 100 ms timeout in wait_for_response() as a serialization guarantee.

Yeah, that timeout is so that the code can wake up every 100 ms to check
if the device has been rescinded (i.e., removed). If the device isn't
rescinded, wait_for_response() waits forever until a response comes in.

>
> Chalk it up to previous experience dealing with hardware that's *supposed* to be
> spec-compliant and complete initialization within specified timings. :)
>
> I see now that the flush is sufficient though.
>
> >
> > Thinking more about the "hv_pcibus_installed" case, if that path is ever
> > triggered, I don't think anything needs to be done with the logical device ID.
> > The vPCI device has already been fully initialized on the Linux side, and it's
> > logical device ID would not change.
> >
> > So I think you could construct the full logical device ID once
> > hv_pci_query_relations() returns to hv_pci_probe().
>
> Let me think about this more and decide between the logical ID and full bus GUID
> options.
>
> >
> >> safest to take the approach to communicate the GUID, and find the function number from
> >> the pci_dev. This does mean that there will be an essentially identical copy of
> >> hv_build_logical_dev_id() in the IOMMU code, but a comment can explain that.
> >
> > With this alternative approach, is there a need to communicate the full
> > GUID to the pvIOMMU drvier? Couldn't you just communicate bytes 4 thru
> > 7, which would be logical device ID minus the function number?
>
> Yes, we could just communicate bytes 4 through 7 but the pvIOMMU version of the build logical
> ID function would diverge from the pci-hyperv version. I figured if we say (in a comment)
> that this is the same ID as generated in pci-hyperv, it's better for future readers to see it
> to be clearly identical at first glance.
>
> It's also possible to change the pci-hyperv function to only take bytes 4 through 7 instead of the
> full GUID, but I rather think we don't need that impedance mismatch of bytes 4 through 7 of the
> GUID becoming bytes 0 through 3 of a u32.
>
> >
> >>
> >>>>
> >>>>>
> >>>>> So have the Hyper-V PV IOMMU driver provide an EXPORTed function to accept
> >>>>> a PCI domain ID and the related logical device ID. The PV IOMMU driver is
> >>>>> responsible for storing this data in a form that it can later search. hv_pci_probe()
> >>>>> calls this new function when it instantiates a new PCI pass-thru device. Then when
> >>>>> the IOMMU driver needs to attach a new device, it can get the PCI domain ID
> >>>>> from the struct pci_dev (or struct pci_bus), search for the related logical device
> >>>>> ID in its own data structure, and use it. The pci-hyperv driver has a dependency
> >>>>> on the IOMMU driver, but that's a dependency in the desired direction. The
> >>>>> PCI domain ID and logical device ID are just integers, so no data structures are
> >>>>> shared.
> >>>>
> >>>> In a previous reply on this thread, you raised the uniqueness issue of bytes 4 and 5
> >>>> of the GUID being used to create the domain number. I thought this approach could
> >>>> help with that too, but as I coded it up, I realized that using the domain number
> >>>> (not guaranteed to be unique) to search for the bus instance GUID (guaranteed to be unique)
> >>>> is the wrong way around. It is unfortunately the only available key in the pci_dev
> >>>> handed to the pvIOMMU driver in this approach though...
> >>>>
> >>>> Do you think that's a fatal flaw?
> >>>
> >>> There are two uniqueness problems, which I didn't fully separate conceptually
> >>> until writing this. One problem is constructing a PCI domain ID that Linux can use
> >>> to identify the virtual PCI bus that the Hyper-V PCI driver creates for each vPCI
> >>> device. The Hyper-V virtual PCI driver uses GUID bytes 4 and 5, and recognizes
> >>> that they might not be unique. So there's code in hv_pci_probe() to pick another
> >>> number if there's a duplicate. Hyper-V doesn't really care how Linux picks the
> >>> domain ID for the virtual PCI bus as it's purely a Linux construct.
> >>
> >> This part matters for the IOMMU driver as it is the key we will use to search the data
> >> structure to find the right GUID to construct the logical dev ID that Hyper-V recognizes.
> >
> > Right. But the Hyper-V vPCI driver in Linux ensures that the domain ID is unique
> > in the sense that two active vPCI devices will not have the same domain ID. So
> > the pvIOMMU driver should not encounter any ambiguity when looking up the
> > logical device ID.
>
> Agreed, that was a fragment of a thought that I neglected to delete before sending.
> Apologies.
>
> > As you noted below, it's possible that a vPCI device could go
> > away, and another vPCI device could be added that ends up with a domain ID
> > that was previously used. When that added vPCI device is setup by the Hyper-V
> > vPCI driver, it will inform the pvIOMMU driver about the domain ID -> logical
> > device ID mapping, and it might overwrite an existing mapping if the newly
> > added vPCI device ended up with a domain ID that had previously been used.
> > And that's fine.
>
> Yes.
>
> >>
> >>>
> >>> The second problem is the logical device ID that Hyper-V interprets to
> >>> identify a vPCI device in hypercalls such a HVCALL_RETARGET_INTERRUPT
> >>> and the new pvIOMMU related hypercalls. This logical device ID uses
> >>> GUID bytes 4 thru 7 (minus 1 bit). I don’t think Linux uses the
> >>> logical device ID for anything. Since only Hyper-V interprets it, Hyper-V
> >>> must somehow be ensuring uniqueness of bytes 4 thru 7 (minus 1 bit).
> >>> That's something to confirm with the Hyper-V team. If they are just hoping
> >>> for the best, I don't know how Linux can solve the problem.
> >>
> >> I checked with the Hyper-V vPCI team on this aspect and the only guarantee that
> >> they provide is that, at any given time, there will only be 1 device with a given
> >> logical ID attached to a VM.
> >
> > OK, so Hyper-V is guaranteeing the uniqueness of vPCI device GUID bytes 4
> > thru 7 across all vPCI devices that are attached to a VM at a given point in time.
> > That's good!
>
> Technically, they're guaranteeing only that the *combination* of GUID bytes 4 through 7 AND
> the slot number will be unique across all vPCI devices that are attached to a VM at a given
> point in time. As you say below, while we have in practice not seen multiple devices on a
> vPCI bus, the vPCI team asserts that there is no restriction in the stack on doing so.
>

Agreed.

> >
> >> Once a device has been removed, everything about it is
> >> forgotten from the Hyper-V stack's perspective, and nothing in the Hyper-V stack would
> >> prevent a scenario where, for example, a data movement accelerator is attached with
> >> logical ID X, then revoked, and let's say a NIC is attached with the same logical ID X.
> >
> > And the "forgetting" behavior is the same in Linux. Once the device is removed,
> > Linux forgets everything about it. If a new vPCI device shows up and happens
> > to have the same GUID as a previous device, that should not cause any problems
> > in Linux.
> >
> >>
> >> Also, FWIW, they also stated that the GUID is not unique and cannot be
> >> guaranteed to be unique because it's the GUID for the bus, not the individual
> >> devices.
> >
> > I'm not sure I understand this statement. Is this referring to the possibility
> > that a vPCI "device" that Hyper-V offers to the guest might have multiple
> > functions?
>
> Yes, apologies for the vagueness.
>
> > The vPCI device driver in Linux has code to recognize this case,
> > but I'm not aware of any current cases where it happens. In such a case,
> > Linux should create a single PCI bus abstraction with multiple devices
> > attached to it, with each device being a different function. If Hyper-V
> > did ever offer a multiple-function configuration, there might be some
> > debugging to do in the Hyper-V vPCI driver in Linux!
> >
> > We shortcut the terminology by referring to a vPCI "device", and assuming
> > that devices and busses are 1-to-1. But design allows for multiple devices
> > as different functions on the same bus.
> >
> >>
>
> <snip>
>
> >>>>>
> >>>>> I don't think the pci-hyperv driver even needs to tell the IOMMU driver to
> >>>>> remove the information if a PCI pass-thru device is unbound or removed, as
> >>>>> the logical device ID will be the same if the device ever comes back. At worst,
> >>>>> the IOMMU driver can simply replace an existing logical device ID if a new one
> >>>>> is provided for the same PCI domain ID.
> >>>>
> >>>> As above, replacing a unique GUID when a result is found for a non-unique
> >>>> key value may be prone to failure if it happens that the device that came "back"
> >>>> is not in fact the same device (or class of device) that went away and just happens
> >>>> to, either due to bytes 4 and 5 being identical, or due to collision in the
> >>>> pci_domain_nr_dynamic_ida, have the same domain number.
> >>
> >> Given the vPCI team's statements (above), I think we will need to handle unbind or
> >> removal and ensure the pvIOMMU drivers data structure is invalidated when either
> >> happens.
> >
> > The generic PCI code should handle detaching from the pvIOMMU. So I'm assuming
> > your statement is specifically about the mapping from domain ID to logical device ID.
>
> Yes, apologies for the vagueness (again).
>
> > I still think removing it may be unnecessary since adding a mapping for a new vPCI
> > device with the same domain ID but different logical device ID could just overwrite
> > any existing mapping. And leaving a dead mapping in the pvIOMMU data structures
> > doesn’t actually hurt anything. On the other hand, removing/invalidating it is
> > certainly more tidy and might prevent some confusion down the road.
> >
>
> Yes, if the data structure maps domain -> logical ID, we can do the overwrite as you say.
> With my approach of informing the pvIOMMU driver of the entire (bus) GUID, we would want
> to be careful that we don't assume the 1:1 bus<->device case and overwrite an existing
> device entry with a new device that's on the same bus.

Yes, that's a valid point. I was assuming that the pvIOMMU would use the
domain ID at the lookup key, since the domain ID is directly available from the
struct pci_dev that is an input parameter to the IOMMU functions. But in the
not 1:1 case, that domain ID might refer to a bus with multiple functions. The
logical device IDs for those devices will be the same except for the low order
3 bits that encode with the function number. So maybe the domain ID maps
to a partial logical device ID, and the pvIOMMU driver must always add in the
function number so the not 1:1 case works.

Would the pvIOMMU driver do anything with the full GUID, except extract
bytes 4 through 7? There's no way I see to use the full GUID as the lookup
key.

Michael