Re: [PATCH V2 1/4] drivers/fpga/amd: Add new driver amd versal-pci
From: Xu Yilun
Date: Wed Feb 05 2025 - 23:16:29 EST
On Sun, Jan 26, 2025 at 11:46:54AM -0800, Yidong Zhang wrote:
>
>
> On 1/26/25 02:32, Xu Yilun wrote:
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> >
> >
> > On Tue, Dec 10, 2024 at 10:37:30AM -0800, Yidong Zhang wrote:
> > > AMD Versal based PCIe card, including V70, is designed for AI inference
> > > efficiency and is tuned for video analytics and natural language processing
> > > applications.
> > >
> > > The driver architecture:
> > >
> > > +---------+ Communication +---------+ Remote +-----+------+
> > > | | Channel | | Queue | | |
> > > | User PF | <============> | Mgmt PF | <=======>| FW | FPGA |
> > > +---------+ +---------+ +-----+------+
> > > PL Data base FW
> > > APU FW
> > > PL Data (copy)
> > > - PL (FPGA Program Logic)
> > > - FW (Firmware)
> > >
> > > There are 2 separate drivers from the original XRT[1] design.
> > > - UserPF driver
> > > - MgmtPF driver
> > >
> > > The new AMD versal-pci driver will replace the MgmtPF driver for Versal
> > > PCIe card.
> > >
> > > The XRT[1] is already open-sourced. It includes solution of runtime for
> > > many different type of PCIe Based cards. It also provides utilities for
> > > managing and programming the devices.
> > >
> > > The AMD versal-pci stands for AMD Versal brand PCIe device management
> > > driver. This driver provides the following functionalities:
> > >
> > > - module and PCI device initialization
> > > this driver will attach to specific device id of V70 card;
> > > the driver will initialize itself based on bar resources for
> > > - communication channel:
> > > a hardware message service between mgmt PF and user PF
> > > - remote queue:
> > > a hardware queue based ring buffer service between mgmt PF and PCIe
> > > hardware firmware for programming FPGA Program Logic, loading
> > > firmware and checking card healthy status.
> > >
> > > - programming FW
> > > - The base FW is downloaded onto the flash of the card.
> > > - The APU FW is downloaded once after a POR (power on reset).
> > > - Reloading the MgmtPF driver will not change any existing hardware.
> > >
> > > - programming FPGA hardware binaries - PL Data
> > > - using fpga framework ops to support re-programing FPGA
> > > - the re-programming request will be initiated from the existing UserPF
> > > driver only, and the MgmtPF driver load the matched PL Data after
> > > receiving request from the communication channel. The matching PL
> >
> > I think this is not the way the FPGA generic framework should do. A FPGA
> > region user (your userPF driver) should not also be the reprogram requester.
> > The user driver cannot deal with the unexpected HW change if it happens.
> > Maybe after reprogramming, the user driver cannot match the device
> > anymore, and if user driver is still working on it, crash.
>
> One thing to clarify. The current design is:
>
> The userPF driver is the only requester. The mgmtPF has no uAPI to reprogram
> the FPGA.
>
>
> >
> > The expected behavior is, the FPGA region removes user devices (thus
> > detaches user drivers), does reprogramming, re-enumerates/rescans and
> > matches new devices with new drivers. And I think that's what Nava is
> > working on.
> >
>
> Nava's work is different than our current design, our current design is:
>
> the separate userPF driver will detach all services before requesting to the
> mgmtPF to program the FPGA, and after the programming is done, the userPF
> will re-enumerate/rescan the matching new devices.
That's not align with the Device-Driver Model, A device driver should not
re-enumerate/rescan a matching device.
>
> The mgmtPF is a util driver which is responsible for communicating with the
> mgmtPF PCIe bar resources.
>
>
> > BTW, AFAICS the expected flow is easier to implement for of-fpga-region,
> > but harder for PCI devices. But I think that's the right direction and
> > should try to work it out.
>
> Could I recap the suggested design if I understand that correctly...
>
> You are thinking that the mgmtPF (aka. versal-pci) driver should have a uAPI
This should be the unified fpga-region class uAPI.
> to trigger the FPGA re-programing; and using Nava's callback ops to detach
> the separate userPF driver; after re-programing is done, re-attch the userPF
No need to detach a specific driver, remove all devices in the
fpga-region. I imagine this could also be a generic flow for all PCI/e
based FPGA cards.
Thanks,
Yilun
> driver and allow the userPF driver re-enumerate all to match the new
> hardware.
>
> I think my understanding is correct, it is doable.
>
> As long as we can keep our userPF driver as separate driver, the code change
> won't be too big.
>
> >
> > Thanks,
> > Yilun