On 8/14/2024 2:24 PM, Lizhi Hou wrote:
On 8/14/24 11:46, Jeffrey Hugo wrote:
On 8/14/2024 12:16 PM, Lizhi Hou wrote:
On 8/9/24 09:11, Jeffrey Hugo wrote:
On 8/5/2024 11:39 AM, Lizhi Hou wrote:Other headers are indirectly included by "aie2_pci.h" underneath.
diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c
new file mode 100644
index 000000000000..3660967c00e6
--- /dev/null
+++ b/drivers/accel/amdxdna/aie2_pci.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023-2024, Advanced Micro Devices, Inc.
+ */
+
+#include <linux/amd-iommu.h>
+#include <linux/errno.h>
+#include <linux/firmware.h>
+#include <linux/iommu.h>
You are clearly missing linux/pci.h and I suspect many more.
aie2_pci.h also does not directly include linux/pci.h
it is aie2_pci.h --> amdxdna_pci_drv.h --> linux/pci.h.
It compiles without any issue.
I did not doubt that it compiled. To be clear, I'm pointing out what I believe is poor style - relying on implicit includes.
There is no reason that amdxdna_pci_drv.h needs to include linux/pci.h (at-least in this patch). That header file doesn't use any of the content from pci.h. If this were merged, I suspect it would be valid for someone to post a cleanup patch that removes pci.h from amdxdna_pci_drv.h.
The problem comes when someone does a tree wide refactor of some header - perhaps moving a function out of pci.h into something else. If they grep the source tree, they'll find amdxdna_pci_drv.h includes pci.h but really doesn't use it. They likely won't see aie2_pci.c which may break because of the refactor. Even more problematic is if pci.h is including something that you need, and you are not including it anywhere. The code will still compile, but maybe in the next kernel cycle pci.h no longer includes that thing. Your code will break.
The 4 includes you have here seems entirely too little, and I'm not clearly seeing the logic of what gets explicitly included vs what is implicitly included. firmware.h is explicitly included, but pci.h is not, yet it seems like you use a lot more from pci.h.
There is the include what you use project that attempts to automate this, although I don't know how well it works with kernel code - https://github.com/include-what-you-use/include-what-you-use
The fw may return interrupt 2 is assigned to hardware context. Then the driver may not deal with it in this case. I think it is ok to fail if the system has very limited resource.
That is possible. the driver supports different hardware. And the fw assigns vector for hardware context dynamically. So the driver needs to allocate all vectors ahead.+
+ ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+ if (ret) {
+ XDNA_ERR(xdna, "Failed to set DMA mask: %d", ret);
+ goto release_fw;
+ }
+
+ nvec = pci_msix_vec_count(pdev);
This feels weird. Can your device advertise variable number of MSI-X vectors? It only works if all of the vectors are used?
So, if the device requests N MSIs, but the host is only able to satisfy 1 (or some number less than N), the fw is completely unable to function?
Ok. I suspect you'll want to change that behavior in the future with a fw update, but if this is how things work today, then this is how the driver must be.