Re: [PATCH v4 3/3] vfio/nvgrace-gpu: Check the HBM training and C2C link status

From: Ankit Agrawal
Date: Sun Jan 19 2025 - 22:35:38 EST


> No, this is standard PCI driver stuff, everything you need is already
> there.  Probably pci_enable_device() and some variant of
> pci_request_regions().

Ok thanks, I'll take a look at that.

>> > > Does this delay even need to happen in the probe function, or could it
>> > > happen in the open_device callback?  That would still be before user
>> > > access, but if we expect it to generally work, it would allow the
>> > > training to happen in the background up until the user tries to open
>> > > the device.  Thanks,
>> > >
>> > > Alex
>> >
>> > The thought process is that since it is purely bare metal coming to proper
>> > state while boot, the nvgrace module should probably wait for the startup
>> > to complete during probe() instead of delaying until open() time.
>>
>> If the driver is statically loaded, that might mean you're willing to
>> stall boot for up to 30s.  In practice is this ever actually going to
>> fail?  Thanks,

No, I have not seen this getting timeout in my testing. 30s is considered
to be sufficient to be sure that the hardware is not in a bad state.

> On second thought, I guess a vfio-pci variant driver can't
> automatically bind to a device, whether statically built or not, so
> maybe this isn't a concern.  I'm not sure if there are other concerns
> with busy waiting for up to 30s at driver probe.  Thanks,
>
> Alex