Re: [PATCH V2 00/10] AMD XDNA driver

From: Lizhi Hou
Date: Tue Sep 03 2024 - 13:51:01 EST


Hi Jeffrey,


I have completed the code changes based on your comments. And we are still working on the documentation.

Should I send out another patch set with only the code changes for your to review? Or you would wait for the documentation to review together?


Thanks,

Lizhi

On 8/14/24 13:06, Lizhi Hou wrote:

On 8/14/24 11:49, Jeffrey Hugo wrote:
On 8/12/2024 12:16 PM, Lizhi Hou wrote:

On 8/9/24 08:21, Jeffrey Hugo wrote:
On 8/5/2024 11:39 AM, Lizhi Hou wrote:
This patchset introduces a new Linux Kernel Driver, amdxdna for AMD NPUs.
The driver is based on Linux accel subsystem.

NPU (Neural Processing Unit) is an AI inference accelerator integrated
into AMD client CPUs. NPU enables efficient execution of Machine Learning
applications like CNNs, LLMs, etc.  NPU is based on AMD XDNA
architecture [1].

AMD NPU consists of the following components:

   - Tiled array of AMD AI Engine processors.
   - Micro Controller which runs the NPU Firmware responsible for
     command processing, AIE array configuration, and execution management.
   - PCI EP for host control of the NPU device.
   - Interconnect for connecting the NPU components together.
   - SRAM for use by the NPU Firmware.
   - Address translation hardware for protected host memory access by the
     NPU.

NPU supports multiple concurrent fully isolated contexts. Concurrent
contexts may be bound to AI Engine array spatially and or temporarily.

The driver is licensed under GPL-2.0 except for UAPI header which is
licensed GPL-2.0 WITH Linux-syscall-note.

User mode driver stack consists of XRT [2] and AMD AIE Plugin for IREE [3].

Is there a special branch with the code?  I don't see any of the uAPI in either project when searching for the ioctl codes or ioctl structures.

Please see git repo: https://github.com/amd/xdna-driver

This contains the out tree driver and shim code which interact with driver. E.g.

https://github.com/amd/xdna-driver/blob/main/src/shim/bo.cpp#L18

Ok, I need to have a look at this.  Long term is the plan to move the shim to the XRT repo once the driver is merged upstream?
Yes.




The firmware for the NPU is distributed as a closed source binary, and has
already been pushed to the DRM firmware repository [4].

[1] https://www.amd.com/en/technologies/xdna.html
[2] https://github.com/Xilinx/XRT
[3] https://github.com/nod-ai/iree-amd-aie
[4] https://gitlab.freedesktop.org/drm/firmware/-/tree/amd-ipu-staging/amdnpu

Changes since v1:
- Remove some inline defines
- Minor changes based code review comments

Lizhi Hou (10):
   accel/amdxdna: Add a new driver for AMD AI Engine
   accel/amdxdna: Support hardware mailbox
   accel/amdxdna: Add hardware resource solver
   accel/amdxdna: Add hardware context
   accel/amdxdna: Add GEM buffer object management
   accel/amdxdna: Add command execution
   accel/amdxdna: Add suspend and resume
   accel/amdxdna: Add error handling
   accel/amdxdna: Add query functions
   accel/amdxdna: Add firmware debug buffer support

  MAINTAINERS                                   |   9 +
  drivers/accel/Kconfig                         |   1 +
  drivers/accel/Makefile                        |   1 +
  drivers/accel/amdxdna/Kconfig                 |  15 +
  drivers/accel/amdxdna/Makefile                |  22 +
  drivers/accel/amdxdna/TODO                    |   4 +
  drivers/accel/amdxdna/aie2_ctx.c              | 949 ++++++++++++++++++
  drivers/accel/amdxdna/aie2_error.c            | 349 +++++++
  drivers/accel/amdxdna/aie2_message.c          | 775 ++++++++++++++
  drivers/accel/amdxdna/aie2_msg_priv.h         | 372 +++++++
  drivers/accel/amdxdna/aie2_pci.c              | 756 ++++++++++++++
  drivers/accel/amdxdna/aie2_pci.h              | 264 +++++
  drivers/accel/amdxdna/aie2_psp.c              | 137 +++
  drivers/accel/amdxdna/aie2_smu.c              | 112 +++
  drivers/accel/amdxdna/aie2_solver.c           | 329 ++++++
  drivers/accel/amdxdna/aie2_solver.h           | 156 +++
  drivers/accel/amdxdna/amdxdna_ctx.c           | 597 +++++++++++
  drivers/accel/amdxdna/amdxdna_ctx.h           | 165 +++
  drivers/accel/amdxdna/amdxdna_drm.c           | 172 ++++
  drivers/accel/amdxdna/amdxdna_drm.h           | 114 +++
  drivers/accel/amdxdna/amdxdna_gem.c           | 700 +++++++++++++
  drivers/accel/amdxdna/amdxdna_gem.h           |  73 ++
  drivers/accel/amdxdna/amdxdna_mailbox.c       | 582 +++++++++++
  drivers/accel/amdxdna/amdxdna_mailbox.h       | 124 +++
  .../accel/amdxdna/amdxdna_mailbox_helper.c    |  50 +
  .../accel/amdxdna/amdxdna_mailbox_helper.h    |  43 +
  drivers/accel/amdxdna/amdxdna_pci_drv.c       | 234 +++++
  drivers/accel/amdxdna/amdxdna_pci_drv.h       |  31 +
  drivers/accel/amdxdna/amdxdna_sysfs.c         |  58 ++
  drivers/accel/amdxdna/npu1_regs.c             |  94 ++
  drivers/accel/amdxdna/npu2_regs.c             | 111 ++
  drivers/accel/amdxdna/npu4_regs.c             | 111 ++
  drivers/accel/amdxdna/npu5_regs.c             | 111 ++
  include/trace/events/amdxdna.h                | 101 ++
  include/uapi/drm/amdxdna_accel.h              | 456 +++++++++
  35 files changed, 8178 insertions(+)
  create mode 100644 drivers/accel/amdxdna/Kconfig
  create mode 100644 drivers/accel/amdxdna/Makefile
  create mode 100644 drivers/accel/amdxdna/TODO
  create mode 100644 drivers/accel/amdxdna/aie2_ctx.c
  create mode 100644 drivers/accel/amdxdna/aie2_error.c
  create mode 100644 drivers/accel/amdxdna/aie2_message.c
  create mode 100644 drivers/accel/amdxdna/aie2_msg_priv.h
  create mode 100644 drivers/accel/amdxdna/aie2_pci.c
  create mode 100644 drivers/accel/amdxdna/aie2_pci.h
  create mode 100644 drivers/accel/amdxdna/aie2_psp.c
  create mode 100644 drivers/accel/amdxdna/aie2_smu.c
  create mode 100644 drivers/accel/amdxdna/aie2_solver.c
  create mode 100644 drivers/accel/amdxdna/aie2_solver.h
  create mode 100644 drivers/accel/amdxdna/amdxdna_ctx.c
  create mode 100644 drivers/accel/amdxdna/amdxdna_ctx.h
  create mode 100644 drivers/accel/amdxdna/amdxdna_drm.c
  create mode 100644 drivers/accel/amdxdna/amdxdna_drm.h
  create mode 100644 drivers/accel/amdxdna/amdxdna_gem.c
  create mode 100644 drivers/accel/amdxdna/amdxdna_gem.h
  create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox.c
  create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox.h
  create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox_helper.c
  create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox_helper.h
  create mode 100644 drivers/accel/amdxdna/amdxdna_pci_drv.c
  create mode 100644 drivers/accel/amdxdna/amdxdna_pci_drv.h
  create mode 100644 drivers/accel/amdxdna/amdxdna_sysfs.c
  create mode 100644 drivers/accel/amdxdna/npu1_regs.c
  create mode 100644 drivers/accel/amdxdna/npu2_regs.c
  create mode 100644 drivers/accel/amdxdna/npu4_regs.c
  create mode 100644 drivers/accel/amdxdna/npu5_regs.c
  create mode 100644 include/trace/events/amdxdna.h
  create mode 100644 include/uapi/drm/amdxdna_accel.h


No Documentation?

Is it ok to add a work item to TODO and add documentation in later patches?

I beleive best practice would be to add Documnetation in the same patch/series that adds the functionality.  I'm not expecting Documentation for items not implemented in this series, however I think describing the product/architecture/other high level topics would help put the code in context during review.

It does seem like the AMD GPU driver had a lot of documentation, which makes the lack of documentation for the AMD Accel driver particularly odd.

Ok.  We will work on the document


Thanks,

Lizhi