Re: [PATCH V2 1/4] drivers/fpga/amd: Add new driver amd versal-pci

From: Yidong Zhang
Date: Wed Feb 05 2025 - 23:33:08 EST




On 2/5/25 20:15, Xu Yilun wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Sun, Jan 26, 2025 at 11:46:54AM -0800, Yidong Zhang wrote:


On 1/26/25 02:32, Xu Yilun wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Tue, Dec 10, 2024 at 10:37:30AM -0800, Yidong Zhang wrote:
AMD Versal based PCIe card, including V70, is designed for AI inference
efficiency and is tuned for video analytics and natural language processing
applications.

The driver architecture:

+---------+ Communication +---------+ Remote +-----+------+
| | Channel | | Queue | | |
| User PF | <============> | Mgmt PF | <=======>| FW | FPGA |
+---------+ +---------+ +-----+------+
PL Data base FW
APU FW
PL Data (copy)
- PL (FPGA Program Logic)
- FW (Firmware)

There are 2 separate drivers from the original XRT[1] design.
- UserPF driver
- MgmtPF driver

The new AMD versal-pci driver will replace the MgmtPF driver for Versal
PCIe card.

The XRT[1] is already open-sourced. It includes solution of runtime for
many different type of PCIe Based cards. It also provides utilities for
managing and programming the devices.

The AMD versal-pci stands for AMD Versal brand PCIe device management
driver. This driver provides the following functionalities:

- module and PCI device initialization
this driver will attach to specific device id of V70 card;
the driver will initialize itself based on bar resources for
- communication channel:
a hardware message service between mgmt PF and user PF
- remote queue:
a hardware queue based ring buffer service between mgmt PF and PCIe
hardware firmware for programming FPGA Program Logic, loading
firmware and checking card healthy status.

- programming FW
- The base FW is downloaded onto the flash of the card.
- The APU FW is downloaded once after a POR (power on reset).
- Reloading the MgmtPF driver will not change any existing hardware.

- programming FPGA hardware binaries - PL Data
- using fpga framework ops to support re-programing FPGA
- the re-programming request will be initiated from the existing UserPF
driver only, and the MgmtPF driver load the matched PL Data after
receiving request from the communication channel. The matching PL

I think this is not the way the FPGA generic framework should do. A FPGA
region user (your userPF driver) should not also be the reprogram requester.
The user driver cannot deal with the unexpected HW change if it happens.
Maybe after reprogramming, the user driver cannot match the device
anymore, and if user driver is still working on it, crash.

One thing to clarify. The current design is:

The userPF driver is the only requester. The mgmtPF has no uAPI to reprogram
the FPGA.



The expected behavior is, the FPGA region removes user devices (thus
detaches user drivers), does reprogramming, re-enumerates/rescans and
matches new devices with new drivers. And I think that's what Nava is
working on.


Nava's work is different than our current design, our current design is:

the separate userPF driver will detach all services before requesting to the
mgmtPF to program the FPGA, and after the programming is done, the userPF
will re-enumerate/rescan the matching new devices.

That's not align with the Device-Driver Model, A device driver should not
re-enumerate/rescan a matching device.

Thanks! Yilun.

I will need to discuss this in my team. Our current userPF driver organize services (rom, sensor, msix, etc.) as platform driver and re-enumerate (online/offline) when there is a hardware change.




The mgmtPF is a util driver which is responsible for communicating with the
mgmtPF PCIe bar resources.


BTW, AFAICS the expected flow is easier to implement for of-fpga-region,
but harder for PCI devices. But I think that's the right direction and
should try to work it out.

Could I recap the suggested design if I understand that correctly...

You are thinking that the mgmtPF (aka. versal-pci) driver should have a uAPI

This should be the unified fpga-region class uAPI.

to trigger the FPGA re-programing; and using Nava's callback ops to detach
the separate userPF driver; after re-programing is done, re-attch the userPF

No need to detach a specific driver, remove all devices in the
fpga-region. I imagine this could also be a generic flow for all PCI/e
based FPGA cards.

I see your point. Is there an existing example in current fpga driver for PCI/e based cards?

We will need to talk to our management team and re-design our driver.
I think we have 2 approaches:
1) Align with linux fpga, having one driver for both userPF and mgmtPF; or
2) find a different location (maybe driver/misc) for the version-pci driver, because it is an utility driver, not need to be tied with fpga framework.

Please let me know you thoughts. Which way is acceptable by you.

Thanks,
David

Thanks,
Yilun

driver and allow the userPF driver re-enumerate all to match the new
hardware.

I think my understanding is correct, it is doable.

As long as we can keep our userPF driver as separate driver, the code change
won't be too big.


Thanks,
Yilun