Re: [PATCH v3 0/9] Xilinx AI engine kernel driver

From: Daniel Vetter
Date: Thu Dec 17 2020 - 05:07:18 EST


On Thu, Dec 17, 2020 at 9:40 AM Jiaying Liang <wendy.liang@xxxxxxxxxx> wrote:
>
>
> On 12/15/20 7:23 AM, Alex Deucher wrote:
> > On Mon, Dec 14, 2020 at 7:24 PM Jiaying Liang<wendy.liang@xxxxxxxxxx> wrote:
> >> On 12/11/20 11:39 AM, Daniel Vetter wrote:
> >>> Hi all
> >>>
> >>> On Fri, Dec 11, 2020 at 8:03 PM Alex Deucher<alexdeucher@xxxxxxxxx> wrote:
> >>>> On Mon, Nov 30, 2020 at 3:25 AM Wendy Liang<wendy.liang@xxxxxxxxxx> wrote:
> >>>>> AI engine is the acceleration engine provided by Xilinx. These engines
> >>>>> provide high compute density for vector-based algorithms, and flexible
> >>>>> custom compute and data movement. It has core tiles for compute and
> >>>>> shim tiles to interface the FPGA fabric.
> >>>>>
> >>>>> You can check the AI engine architecture document for more hardware details:
> >>>>> https://www.xilinx.com/support/documentation/architecture-manuals/am009-versal-ai-engine.pdf
> >>>>>
> >>>>> This patch series adds a Linux kernel driver to manage the Xilinx AI
> >>>>> engine array device and AI engine partitions (groups of AI engine tiles
> >>>>> dedicated to an application).
> >>>> Hi Wendy,
> >>>>
> >>>> I think it would be good to provide an overview of how your stack
> >>>> works in general. That would give reviewers a better handle on how
> >>>> all of this fits together. I'd suggest including an overview in the
> >>>> cover letter and also in the commit message and/or as a comment in the
> >>>> code in one of the patches. I'm not really an expert when it comes to
> >>>> FPGAs, but this basically looks like a pretty low level interface to
> >>>> set up the data fabric for a kernel that will run on the soft logic or
> >>>> maybe the microcontroller on the board. It doesn't have to be super
> >>>> detailed, just a nice flow for how you might use this. E.g.,
> >>>>
> >>>> Userspace uses ioctls X, Y, Z to configure the data fabric for the
> >>>> FPGA kernel. The kernels can run on... . DMA access to system memory
> >>>> for data sets can be allocated using ioctl A. DMA access is limited
> >>>> by... . The user can then load the FPGA kernel on to one of the
> >>>> engines using ioctl B and finally they can kick off the whole thing
> >>>> using ioctl C. FPGA kernels are compiled using YYY toolchain and use
> >>>> use the following runtime (link to runtime) to configure the data
> >>>> fabric using ioctls X, Y, Z.
> >>> At least for drm drivers we ideally have that as a .rst file in
> >>> Documentation/. With that you can even do full svg graphs, or just dot
> >>> graphs, of the overall stack if you really want to go overboard :-)
> >>>
> >>>> It would also be good to go over the security implications of the
> >>>> design. E.g., can the FPGA kernel(s) access the DMA engine directly,
> >>>> or is it limited to just the DMA regions set up by the ioctls? Also,
> >>>> does the hardware and software design allow for multiple users? If
> >>>> so, how does that work?
> >>> I've also seen indications that there's some on-chip or on-card
> >>> memory. How that's planned to be used and whether we want to manage
> >>> this (maybe even with something like ttm) would be good to understand.
> >>>
> >>> All excellent questions from Alex, just figured I add some more.
> >>>
> >>> Cheers, Daniel
> >> Hi Alex, Daniel,
> >>
> >> Below is an overview of the driver.
> >>
> >> AI engine kernel driver manages Xilinx AI engine device. An AI engine device
> >> contains cores tiles and SHIM tiles. Core tiles are the computation tiles
> >> , the SHIM tiles are the tiles interfacing to external components.
> >>
> >> +--------+--------+--------+--------+
> >> | Core | Core | Core | Core | ...
> >> | | | | |
> >> +-----------------------------------+
> >> | Core | Core | Core | Core | ...
> >> | | | | |
> >> +--------+--------+--------+---------
> >> ...
> >> +--------+--------+-----------------+
> >> | SHIM | SHIM | SHIM |SHIM |
> >> | PL | PL | PL |PL | NOC |
> >> +---+----+---+----+---+-----+-------+
> >> AXI Streams | | | | |AXI MM
> >> | | | | |
> >> Events Singals | | | | |
> >> | | | | |
> >> | | | | |
> >> +---+--------+--------+-----+ +--+------+
> >> | FPGA | |
> >> NOC |
> >> | | | |
> >> +---------------------------+ +--+-------+
> >> |
> >> |
> >> +---+------+
> >> | DDR |
> >> +----------+
> >>
> >> Each Core tile contains computing module, local memory and DMA module. The
> >> local memory DMA module takes data from or to the AXI streams and writes
> >> it to or reads it from the local memory. The computing module can also
> >> directly get/put data from/to the AXI stream. The AIE SHIM enables AIE tiles
> >> to get/put data from/to AXI streams from FPGA, enables external master to
> >> access AI engine address space through AXI MM. SHIM NoC module has DMA
> >> engine,
> >> which can access extern memory though AXI MM and push it to internal AXI
> >> streams.
> >>
> >> At runtime, the AI engine tiles interconnection needs to be configured
> >> so that
> >> it can get fetch data from external components or adjacent tiles, and AI
> >> engine
> >> core program needs to be loaded. And then user application can push data
> >> to the
> >> AI engine array and start/stop AI engine core. AI engine device errors
> >> can be
> >> raised as events, the AI engine kernel driver listens to the events
> >> interrupt
> >> to monitor runtime async device errors.
> >>
> >> Instead of application directly interacting with the AI engine kernel
> >> APIs, user
> >> application/libraries interacts with AI engine userspace library:
> >> https://github.com/Xilinx/embeddedsw/tree/master/XilinxProcessorIPLib/drivers/aienginev2
> >> It provides cross OSes low level functional abstraction such as how to
> >> connect one
> >> stream port to another stream port, how to configure core tile local DMA.
> >>
> >> The AI engine library can be used by other runtime libraries such as
> >> Xilinx runtime (XRT)
> >> library:https://xilinx.github.io/XRT/master/html/index.html,
> >> which provides acceleration abstraction for Xilinx accelerators, it has
> >> extensions
> >> to interface to other acceleration framework such as OpenCL.
> >> XRT provides buffer handling abstractions for user application to share
> >> data between
> >> applicaiton and devices.
> >>
> >> Here is an example of application runtime stack:
> >>
> >> +----------------------------+
> >> | Application |
> >> | |
> >> +----------------------------+
> >> | XRT |
> >> | |
> >> +----------------------------+
> >> | AIE Library |
> >> | |
> >> +----------------------------+
> >> +----------------------------------------+
> >> Kern +----------------------------+
> >> | AIE Partition +--+
> >> +----------------------------+ |
> >> |----------------------------+
> >> +----------------------------+
> >> | AIE Device |
> >> | |
> >> +----------------------------+
> >>
> >>
> >>
> >> The AI engine kernel driver provides the following user interfaces:
> >> * AIE device driver is the root device driver to manage the partitions of
> >> of the AI engine device array. AI engine array can be partitioned into
> >> column wised isolated partitions. Each applicaiton can only access its
> >> own partitions.
> >> * AIE device driver monitors the interrupt from the AI enigne device. All
> >> AI engine tiles shared the same interrupt for error events.
> >> * AIE partition driver controls address mapping and access of the
> >> registers/local memories of the tiles within a partition.
> >> * It provides mmap operation to enable application to direclty
> >> access the
> >> tiles local memories for small data update such as parameter
> >> update for
> >> performance.
> >> * It provides mmap operatio to map all the registers as readonly for
> >> application to poll registers efficiently to check status.
> >> * It provides ioctl for userspace to pass I/O commands to write/mask
> >> write
> >> the registers. How to configure is defined by userspace. Userspace
> >> will
> >> pass the I/O commands sequence to the kernel driver, and kernel driver
> >> will validate the commands before it writes to the registers.
> >> * It provides ioctl to import dmabuf and ioctl to configure the the
> >> DMA module
> >> in the SHIM tile which can access memory outside AI engine array.
> >>
> >> The buffer management is out of this driver. In the above example, user
> >> application
> >> uses Xilinx runtime(XRT), XRT is the one to manage the buffers.
> >>
> > So if I understand this correctly, this driver handles the resource
> > management for the AI engines, PLs (programmable logic), and DMA
> > streams. I think it's important to understand that there are multiple
> > address spaces here. Normally when we talk about DMA in the kernel we
> > are referring to devices accessing an external resource like system
> > memory on the host CPU or another device's MMIO space (e.g., another
> > PCIe device). It would be good to clarify which address spaces the
> > DMAs in your diagram refer to. I think the DMAs in the AI engines are
> > specifically for DMAs within the AI engine logic (e.g., between AIs in
> > a partition). How is DMA to system memory handled? What about
> > dedicated memory on the FPGA (e.g., HBM or DDR on the FPGA itself)?
> > Is that what you are exposing as DMA bufs? When you allocate a
> > DMA-buf for a partition, is that partition only allowed to access
> > memory that is part of that DMA buf? I presume there is some
> > scatter/gather table that sets up the DMA range that the partition can
> > access? Who loads the soft logic (Is that the PL or some other IP)?
> > Is the soft logic partitioned as well? If I had some soft logic I
> > wanted to run on the FPGA, what would the kernel driver interaction
> > sequence look like? Maybe using the OpenCL soft logic would be a good
> > example. E.g.,
>
> The AI engine driver only manage the resources within the AI
>
> engine array. There are two types of DMAs of the AI engine device.
>
> one is the AI engine tile local memory DMA which can only access the local
>
> memory. There is another type of DMA which is in the SHIM tile. This
>
> DMA can access external address space such as DDR. Although it can acess
>
> the memory on fpga if user configure the platform that way, it is
> preferred to
>
> use PL data mover to move data between FPGA memory and AI engine device.
>
> The PL data mover will not be managed by the AI engine driver.
>
> One SHIM DMA has up to 16 buffer descriptors to use.
>
> Xilinx FPGA manager is the one used to program the FPGA soft logic.
>
> E.g. when XRT is used, if AI engine is connected to FPGA logic, the XRT
> stack is
>
> the one to manage the configuration sequence.
>
> > 1. user has soft logic blob generated by their soft logic compiler (is
> > this compiler open source?)
> The soft logic blob is generated by Xilinx tools which is not open
> source yet.
> > 2. user calls AI engine kernel driver to allocate the required
> > resources (AI engines, AI engine DMAs, doorbells of some sort? etc.)
>
> User will call AI engine kernel driver to allocate required resources within
>
> the AI engine array at runtime.
>
> However the patches for it is not in this patch set.
>
> > 3. user calls AI engine kernel driver to allocate system memory and/or
> > FGPA memory that can be used by the soft logic blob
>
> AI engine kernel driver doesn't allocate system memory. User can use other
>
> kernel driver to allocate memory.
>
> E.g. when XRT is used, user calls XRT kernel driver (zocl) to allocate
> system memory.
>
> So far, the FPGA memory is usually assigned to a soft data mover when
> the platform is
>
> created. Are you considering to have the FPGA memory in the DMA pool of the
>
> system? If it is dedicated to a device, can reserved memory solve this
> problem?
>
> The AI engine kernel driver doesn't consider this yet.
>
> > 4. user calls AI engine kernel driver to load soft logic
>
> I assume you are referring to the soft logic on the FPGA side which is not
>
> part of the AI engine device. FPGA manager is the one to load the soft
> logic on FPGA.
>
> > 5. user interfaces with soft logic (how? presumably via some memory
> > resource allocated in 2 and 3?)
>
> I assume you are referring to the soft logic on the FPGA side (not the
> AI engine device)
>
> The user interface with soft logic is managed by the soft logic IP driver.
>
> Each soft logic has some memory mapped control registers. User can
> access those
>
> registers through the soft logic IP driver.
>
> About memory allocation, I think it is better to manage the shared
> memory out of
>
> a specific device driver. Are you looking for memory management which covers
>
> both the system memory and fpga memory, and the device can specify which
> memory
>
> it prefers?

Ok, I think the picture is getting clearer. But now I'm wondering why
you have any interactions with dma-buf in this patch series here?
-Daniel


> Thanks,
>
> Wendy
>
> >
> > Thanks,
> >
> > Alex
> >
> >
> >> Best Regards,
> >>
> >> Wendy
> >>
> >>>> Thanks,
> >>>>
> >>>> Alex
> >>>>
> >>>>
> >>>>> v3:
> >>>>> * unlock AIE dev mutex after failed to gain the partition lock in
> >>>>> errors handing
> >>>>> * replace pointer with __u64 and enum with __u32 in ioctl
> >>>>>
> >>>>> v2:
> >>>>> * Fix dtschema check errors
> >>>>> * Fix test bot warning on interrupt implementation. Removed set but
> >>>>> unused varaible.
> >>>>> * Fix compilation unused function warning of firmware change in case
> >>>>> ZynqMP firmware is not configured
> >>>>> * There are other warning on ZynqMP firmware reported from testbot
> >>>>> which is not introduced by this patch set.
> >>>>> "[PATCH] firmware: xlnx-zynqmp: fix compilation warning" is submitted
> >>>>> for those fixes.
> >>>>>
> >>>>>
> >>>>> Izhar Ameer Shaikh (1):
> >>>>> firmware: xilinx: Add IOCTL support for AIE ISR Clear
> >>>>>
> >>>>> Nishad Saraf (2):
> >>>>> misc: xilinx-ai-engine: Add support to request device management
> >>>>> services
> >>>>> misc: xilinx-ai-engine: Add support for servicing error interrupts
> >>>>>
> >>>>> Wendy Liang (6):
> >>>>> dt-binding: soc: xilinx: ai-engine: Add AI engine binding
> >>>>> misc: Add Xilinx AI engine device driver
> >>>>> misc: xilinx-ai-engine: Implement AI engine cleanup sequence
> >>>>> misc: xilinx-ai-engine: expose AI engine tile memories to userspace
> >>>>> misc: xilinx-ai-engine: add setting shim dma bd operation
> >>>>> misc: xilinx-ai-engine: add request and release tiles
> >>>>>
> >>>>> .../bindings/soc/xilinx/xlnx,ai-engine.yaml | 126 ++++
> >>>>> MAINTAINERS | 8 +
> >>>>> drivers/firmware/xilinx/zynqmp.c | 14 +
> >>>>> drivers/misc/Kconfig | 12 +
> >>>>> drivers/misc/Makefile | 1 +
> >>>>> drivers/misc/xilinx-ai-engine/Makefile | 16 +
> >>>>> drivers/misc/xilinx-ai-engine/ai-engine-aie.c | 608 +++++++++++++++++++
> >>>>> drivers/misc/xilinx-ai-engine/ai-engine-clock.c | 245 ++++++++
> >>>>> drivers/misc/xilinx-ai-engine/ai-engine-dev.c | 496 ++++++++++++++++
> >>>>> drivers/misc/xilinx-ai-engine/ai-engine-dma.c | 481 +++++++++++++++
> >>>>> drivers/misc/xilinx-ai-engine/ai-engine-internal.h | 519 ++++++++++++++++
> >>>>> .../misc/xilinx-ai-engine/ai-engine-interrupt.c | 659 +++++++++++++++++++++
> >>>>> drivers/misc/xilinx-ai-engine/ai-engine-mem.c | 275 +++++++++
> >>>>> drivers/misc/xilinx-ai-engine/ai-engine-part.c | 635 ++++++++++++++++++++
> >>>>> drivers/misc/xilinx-ai-engine/ai-engine-res.c | 219 +++++++
> >>>>> drivers/misc/xilinx-ai-engine/ai-engine-reset.c | 159 +++++
> >>>>> include/linux/firmware/xlnx-zynqmp.h | 8 +
> >>>>> include/uapi/linux/xlnx-ai-engine.h | 238 ++++++++
> >>>>> 18 files changed, 4719 insertions(+)
> >>>>> create mode 100644 Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
> >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/Makefile
> >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-aie.c
> >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-clock.c
> >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-dev.c
> >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-dma.c
> >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-internal.h
> >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-interrupt.c
> >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-mem.c
> >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-part.c
> >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-res.c
> >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-reset.c
> >>>>> create mode 100644 include/uapi/linux/xlnx-ai-engine.h
> >>>>>
> >>>>> --
> >>>>> 2.7.4
> >>>>>
> >>>>> _______________________________________________
> >>>>> dri-devel mailing list
> >>>>> dri-devel@xxxxxxxxxxxxxxxxxxxxx
> >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>>> _______________________________________________
> >>>> dri-devel mailing list
> >>>> dri-devel@xxxxxxxxxxxxxxxxxxxxx
> >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch