Re: [PATCH V2 1/4] drivers/fpga/amd: Add new driver amd versal-pci

From: Yidong Zhang
Date: Mon Feb 10 2025 - 06:33:23 EST




On 2/6/25 20:40, Xu Yilun wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Thu, Feb 06, 2025 at 07:16:25PM -0800, Yidong Zhang wrote:


On 2/6/25 18:19, Xu Yilun wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


No need to detach a specific driver, remove all devices in the
fpga-region. I imagine this could also be a generic flow for all PCI/e
based FPGA cards.

I see your point. Is there an existing example in current fpga driver for
PCI/e based cards?

No. The fpga-region re-enumeration arch is still WIP, so no existing
implementation.

Got you. I can also help to test or provide feedback.
Actually, I had sent Nava my protype of using configfs for the non-OF
driver. He should have the updated patch soon.



We will need to talk to our management team and re-design our driver.
I think we have 2 approaches:
1) Align with linux fpga, having one driver for both userPF and mgmtPF; or

Don't get you. Linux FPGA doesn't require one driver for both PFs.

I assume when you said "generic flow for removing all devices in
fpga-region" means that there is a single driver which use the fpga-region
callbacks to remove all devices and then re-progream the FPGA.

On AMD V70 hardware, the userPF has different deviceid than the mgmtPF.
Thus, our current design is having 2 separate pcie drivers for each
different deviceid.

Thus, in our current design, the fpga-region should be in the userPF driver
which has callbacks to remove all devices. But in mgmtPF, it is more like a

A question, if the FPGA logic is cleared, does the userPF still exist on
PCIe bus?

I am not sure I fully understand your question by "FPGA logic is cleared". Based on our design, the userPF exists, but all services need to be re-configured after hardware is changed.



utility which only handles request from the userPF but it has the management
privilege. Usually, cloud vendors require the mgmtPF deployed in their
secured domain (can be a separate physical machine).

I though mgmtPF & userPF are just 2 functions on one PCIe card, they are not?
Then how the mgmtPF writes data to another physical machine and
influence the userPF?


They are functions but they also have different device id thus we can bind different driver onto each of them.

I think I should interrupt it accurately as Host and VM. See details here please:
https://xilinx.github.io/XRT/2019.2/html/cloud_vendor_support.html

But, my point is that we do have 2 separate drivers.

Please let me know if you need more detailed info.

/David

Thanks,
Yilun


We can keep 2 drivers, one for userPF, one for mgmtPF. The userPF and mgmtPF
communicate each other via the communication channel because they can be
physically on different machine.

Are you looking for us to upstream both of them together?
If yes, I still think that the fpga-region should be in the userPF driver.
The mgmtPF driver is still a utility driver.

Please correct me if I am wrong.:)


2) find a different location (maybe driver/misc) for the version-pci driver,
because it is an utility driver, not need to be tied with fpga framework.

I'm not the misc maintainer, but I assume in-tree utility driver +
out-of-tree client driver is not generally welcomed.

Great info! Thanks! I will have to discuss this with my team too.

My understanding, so far based on your comments above, the drivers/fpga
prefer to use fpga-region ops to handle FPGA re-program changes.

The current versal-pci driver is a utility driver.
The fpga-region should be within the userPF driver.

We can try to make our userPF driver in-tree as well. But the current plan
is to do it later.

If you prefer we do both of them together. I can talk to my team.

Thanks,
David

Thanks,
Yilun


Please let me know you thoughts. Which way is acceptable by you.

Thanks,
David

Thanks,
Yilun

driver and allow the userPF driver re-enumerate all to match the new
hardware.

I think my understanding is correct, it is doable.

As long as we can keep our userPF driver as separate driver, the code change
won't be too big.


Thanks,
Yilun