Re: [PATCH v3 3/3] vfio/nvgrace-gpu: Check the HBM training and C2C link status

From: Alex Williamson
Date: Fri Jan 17 2025 - 16:31:40 EST


On Fri, 17 Jan 2025 21:13:52 +0000
Ankit Agrawal <ankita@xxxxxxxxxx> wrote:

> >> > We're accessing device memory here but afaict the memory enable bit of
> >> > the command register is in an indeterminate state.  What happens if you
> >> > use setpci to clear the memory enable bit or 'echo 0 > enable' before
> >> > binding the driver?  Thanks,
> >> >
> >> > Alex
> >>
> >> Hi Alex, sorry I didn't understand how we are accessing device memory here if
> >> the C2C_LINK_BAR0_OFFSET and HBM_TRAINING_BAR0_OFFSET are BAR0 regs.
> >> But anyways, I tried 'echo 0 > <sysfs_path>/enable' before device bind. I am not
> >> observing any issue and the bind goes through.
> >>
> >> Or am I missing something?
> >
> > BAR0 is what I'm referring to as device memory.  We cannot access
> > registers in BAR0 unless the memory space enable bit of the command
> > register is set.  The nvgrace-gpu driver makes no effort to enable this
> > and I don't think the PCI core does before probe either.  Disabling
> > through sysfs will only disable if it was previously enabled, so
> > possibly that test was invalid.  Please try with setpci:
> >
> > # Read command register
> > $ setpci -s xxxx:xx:xx.x COMMAND
> > # Clear memory enable
> > $ setpci -s xxxx:xx:xx.x COMMAND=0:2
> > # Re-read command register
> > $ setpci -s xxxx:xx:xx.x COMMAND
> >
> > Probe driver here now that the memory enable bit should re--back as
> > unset.  Thanks,
> >
> > Alex
>
> Ok, yeah. I tried to disable through setpci, and the probe is failing with ETIME.
> Should we check if disabled and return -EIO for such situation to differentiate
> from timeout?

No, the driver needs to enable memory on the device around the iomap
rather than assuming the initial state. Thanks,

Alex