Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver

From: Dave Airlie
Date: Fri Mar 29 2019 - 00:56:33 EST


On Thu, 28 Mar 2019 at 10:14, Sonal Santan <sonals@xxxxxxxxxx> wrote:
>
>
>
> > -----Original Message-----
> > From: Daniel Vetter [mailto:daniel.vetter@xxxxxxxx] On Behalf Of Daniel Vetter
> > Sent: Wednesday, March 27, 2019 7:12 AM
> > To: Sonal Santan <sonals@xxxxxxxxxx>
> > Cc: Daniel Vetter <daniel@xxxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx;
> > gregkh@xxxxxxxxxxxxxxxxxxx; Cyril Chemparathy <cyrilc@xxxxxxxxxx>; linux-
> > kernel@xxxxxxxxxxxxxxx; Lizhi Hou <lizhih@xxxxxxxxxx>; Michal Simek
> > <michals@xxxxxxxxxx>; airlied@xxxxxxxxxx
> > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator driver
> >
> > On Wed, Mar 27, 2019 at 12:50:14PM +0000, Sonal Santan wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Daniel Vetter [mailto:daniel@xxxxxxxx]
> > > > Sent: Wednesday, March 27, 2019 1:23 AM
> > > > To: Sonal Santan <sonals@xxxxxxxxxx>
> > > > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx;
> > > > Cyril Chemparathy <cyrilc@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
> > > > Lizhi Hou <lizhih@xxxxxxxxxx>; Michal Simek <michals@xxxxxxxxxx>;
> > > > airlied@xxxxxxxxxx
> > > > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe accelerator
> > > > driver
> > > >
> > > > On Wed, Mar 27, 2019 at 12:30 AM Sonal Santan <sonals@xxxxxxxxxx>
> > wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Daniel Vetter [mailto:daniel.vetter@xxxxxxxx] On Behalf Of
> > > > > > Daniel Vetter
> > > > > > Sent: Monday, March 25, 2019 1:28 PM
> > > > > > To: Sonal Santan <sonals@xxxxxxxxxx>
> > > > > > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx;
> > > > > > Cyril Chemparathy <cyrilc@xxxxxxxxxx>;
> > > > > > linux-kernel@xxxxxxxxxxxxxxx; Lizhi Hou <lizhih@xxxxxxxxxx>;
> > > > > > Michal Simek <michals@xxxxxxxxxx>; airlied@xxxxxxxxxx
> > > > > > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe
> > > > > > accelerator driver
> > > > > >
> > > > > > On Tue, Mar 19, 2019 at 02:53:55PM -0700,
> > > > > > sonal.santan@xxxxxxxxxx
> > > > wrote:
> > > > > > > From: Sonal Santan <sonal.santan@xxxxxxxxxx>
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > This patch series adds drivers for Xilinx Alveo PCIe accelerator cards.
> > > > > > > These drivers are part of Xilinx Runtime (XRT) open source
> > > > > > > stack and have been deployed by leading FaaS vendors and many
> > > > > > > enterprise
> > > > > > customers.
> > > > > >
> > > > > > Cool, first fpga driver submitted to drm! And from a high level
> > > > > > I think this makes a lot of sense.
> > > > > >
> > > > > > > PLATFORM ARCHITECTURE
> > > > > > >
> > > > > > > Alveo PCIe platforms have a static shell and a reconfigurable
> > > > > > > (dynamic) region. The shell is automatically loaded from PROM
> > > > > > > when host is booted and PCIe is enumerated by BIOS. Shell
> > > > > > > cannot be changed till next cold reboot. The shell exposes two
> > physical functions:
> > > > > > > management physical function and user physical function.
> > > > > > >
> > > > > > > Users compile their high level design in C/C++/OpenCL or RTL
> > > > > > > into FPGA image using SDx compiler. The FPGA image packaged as
> > > > > > > xclbin file can be loaded onto reconfigurable region. The
> > > > > > > image may contain one or more compute unit. Users can
> > > > > > > dynamically swap the full image running on the reconfigurable
> > > > > > > region in order to switch between different
> > > > > > workloads.
> > > > > > >
> > > > > > > XRT DRIVERS
> > > > > > >
> > > > > > > XRT Linux kernel driver xmgmt binds to mgmt pf. The driver is
> > > > > > > modular and organized into several platform drivers which
> > > > > > > primarily handle the following functionality:
> > > > > > > 1. ICAP programming (FPGA bitstream download with FPGA Mgr
> > > > > > > integration) 2. Clock scaling 3. Loading firmware container
> > > > > > > also called dsabin (embedded Microblaze
> > > > > > > firmware for ERT and XMC, optional clearing bitstream) 4.
> > > > > > > In-band
> > > > > > > sensors: temp, voltage, power, etc.
> > > > > > > 5. AXI Firewall management
> > > > > > > 6. Device reset and rescan
> > > > > > > 7. Hardware mailbox for communication between two physical
> > > > > > > functions
> > > > > > >
> > > > > > > XRT Linux kernel driver xocl binds to user pf. Like its peer,
> > > > > > > this driver is also modular and organized into several
> > > > > > > platform drivers which handle the following functionality:
> > > > > > > 1. Device memory topology discovery and memory management 2.
> > > > > > > Buffer object abstraction and management for client process 3.
> > > > > > > XDMA MM PCIe DMA engine programming 4. Multi-process aware
> > > > context management 5.
> > > > > > > Compute unit execution management (optionally with help of ERT)
> > for
> > > > > > > client processes
> > > > > > > 6. Hardware mailbox for communication between two physical
> > > > > > > functions
> > > > > > >
> > > > > > > The drivers export ioctls and sysfs nodes for various services.
> > > > > > > xocl driver makes heavy use of DRM GEM features for device
> > > > > > > memory management, reference counting, mmap support and
> > export/import.
> > > > > > > xocl also includes a simple scheduler called KDS which
> > > > > > > schedules compute units and interacts with hardware scheduler
> > > > > > > running ERT firmware. The scheduler understands custom opcodes
> > > > > > > packaged into command objects
> > > > > > and
> > > > > > > provides an asynchronous command done notification via POSIX poll.
> > > > > > >
> > > > > > > More details on architecture, software APIs, ioctl
> > > > > > > definitions, execution model, etc. is available as Sphinx
> > > > > > > documentation--
> > > > > > >
> > > > > > > https://xilinx.github.io/XRT/2018.3/html/index.html
> > > > > > >
> > > > > > > The complete runtime software stack (XRT) which includes out
> > > > > > > of tree kernel drivers, user space libraries, board utilities
> > > > > > > and firmware for the hardware scheduler is open source and
> > > > > > > available at https://github.com/Xilinx/XRT
> > > > > >
> > > > > > Before digging into the implementation side more I looked into
> > > > > > the userspace here. I admit I got lost a bit, since there's lots
> > > > > > of indirections and abstractions going on, but it seems like
> > > > > > this is just a fancy ioctl wrapper/driver backend abstractions.
> > > > > > Not really
> > > > something applications would use.
> > > > > Sonal Santan <sonals@xxxxxxxxxx>
> > > > >
> > > > > 4:20 PM (1 minute ago)
> > > > >
> > > > > to me
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Daniel Vetter [mailto:daniel.vetter@xxxxxxxx] On Behalf Of
> > > > > > Daniel Vetter
> > > > > > Sent: Monday, March 25, 2019 1:28 PM
> > > > > > To: Sonal Santan <sonals@xxxxxxxxxx>
> > > > > > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx;
> > > > > > Cyril Chemparathy <cyrilc@xxxxxxxxxx>;
> > > > > > linux-kernel@xxxxxxxxxxxxxxx; Lizhi Hou <lizhih@xxxxxxxxxx>;
> > > > > > Michal Simek <michals@xxxxxxxxxx>; airlied@xxxxxxxxxx
> > > > > > Subject: Re: [RFC PATCH Xilinx Alveo 0/6] Xilinx PCIe
> > > > > > accelerator driver
> > > > > >
> > > > > > On Tue, Mar 19, 2019 at 02:53:55PM -0700,
> > > > > > sonal.santan@xxxxxxxxxx
> > > > wrote:
> > > > > > > From: Sonal Santan <sonal.santan@xxxxxxxxxx>
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > This patch series adds drivers for Xilinx Alveo PCIe accelerator cards.
> > > > > > > These drivers are part of Xilinx Runtime (XRT) open source
> > > > > > > stack and have been deployed by leading FaaS vendors and many
> > > > > > > enterprise
> > > > > > customers.
> > > > > >
> > > > > > Cool, first fpga driver submitted to drm! And from a high level
> > > > > > I think this makes a lot of sense.
> > > > > >
> > > > > > > PLATFORM ARCHITECTURE
> > > > > > >
> > > > > > > Alveo PCIe platforms have a static shell and a reconfigurable
> > > > > > > (dynamic) region. The shell is automatically loaded from PROM
> > > > > > > when host is booted and PCIe is enumerated by BIOS. Shell
> > > > > > > cannot be changed till next cold reboot. The shell exposes two
> > physical functions:
> > > > > > > management physical function and user physical function.
> > > > > > >
> > > > > > > Users compile their high level design in C/C++/OpenCL or RTL
> > > > > > > into FPGA image using SDx compiler. The FPGA image packaged as
> > > > > > > xclbin file can be loaded onto reconfigurable region. The
> > > > > > > image may contain one or more compute unit. Users can
> > > > > > > dynamically swap the full image running on the reconfigurable
> > > > > > > region in order to switch between different
> > > > > > workloads.
> > > > > > >
> > > > > > > XRT DRIVERS
> > > > > > >
> > > > > > > XRT Linux kernel driver xmgmt binds to mgmt pf. The driver is
> > > > > > > modular and organized into several platform drivers which
> > > > > > > primarily handle the following functionality:
> > > > > > > 1. ICAP programming (FPGA bitstream download with FPGA Mgr
> > > > > > > integration) 2. Clock scaling 3. Loading firmware container
> > > > > > > also called dsabin (embedded Microblaze
> > > > > > > firmware for ERT and XMC, optional clearing bitstream) 4.
> > > > > > > In-band
> > > > > > > sensors: temp, voltage, power, etc.
> > > > > > > 5. AXI Firewall management
> > > > > > > 6. Device reset and rescan
> > > > > > > 7. Hardware mailbox for communication between two physical
> > > > > > > functions
> > > > > > >
> > > > > > > XRT Linux kernel driver xocl binds to user pf. Like its peer,
> > > > > > > this driver is also modular and organized into several
> > > > > > > platform drivers which handle the following functionality:
> > > > > > > 1. Device memory topology discovery and memory management 2.
> > > > > > > Buffer object abstraction and management for client process 3.
> > > > > > > XDMA MM PCIe DMA engine programming 4. Multi-process aware
> > > > context management 5.
> > > > > > > Compute unit execution management (optionally with help of ERT)
> > for
> > > > > > > client processes
> > > > > > > 6. Hardware mailbox for communication between two physical
> > > > > > > functions
> > > > > > >
> > > > > > > The drivers export ioctls and sysfs nodes for various services.
> > > > > > > xocl driver makes heavy use of DRM GEM features for device
> > > > > > > memory management, reference counting, mmap support and
> > export/import.
> > > > > > > xocl also includes a simple scheduler called KDS which
> > > > > > > schedules compute units and interacts with hardware scheduler
> > > > > > > running ERT firmware. The scheduler understands custom opcodes
> > > > > > > packaged into command objects
> > > > > > and
> > > > > > > provides an asynchronous command done notification via POSIX poll.
> > > > > > >
> > > > > > > More details on architecture, software APIs, ioctl
> > > > > > > definitions, execution model, etc. is available as Sphinx
> > > > > > > documentation--
> > > > > > >
> > > > > > > https://xilinx.github.io/XRT/2018.3/html/index.html
> > > > > > >
> > > > > > > The complete runtime software stack (XRT) which includes out
> > > > > > > of tree kernel drivers, user space libraries, board utilities
> > > > > > > and firmware for the hardware scheduler is open source and
> > > > > > > available at https://github.com/Xilinx/XRT
> > > > > >
> > > > > > Before digging into the implementation side more I looked into
> > > > > > the userspace here. I admit I got lost a bit, since there's lots
> > > > > > of indirections and abstractions going on, but it seems like
> > > > > > this is just a fancy ioctl wrapper/driver backend abstractions.
> > > > > > Not really
> > > > something applications would use.
> > > > > >
> > > > >
> > > > > Appreciate your feedback.
> > > > >
> > > > > The userspace libraries define a common abstraction but have
> > > > > different implementations for Zynq Ultrascale+ embedded platform,
> > > > > PCIe based Alveo (and Faas) and emulation flows. The latter lets
> > > > > you run your
> > > > application without physical hardware.
> > > > >
> > > > > >
> > > > > > From the pretty picture on github it looks like there's some
> > > > > > opencl/ml/other fancy stuff sitting on top that applications
> > > > > > would use. Is
> > > > that also available?
> > > > >
> > > > > The full OpenCL runtime is available in the same repository.
> > > > > Xilinx ML Suite is also based on XRT and its source can be found
> > > > > at
> > > > https://github.com/Xilinx/ml-suite.
> > > >
> > > > Hm, I did a few git grep for the usual opencl entry points, but
> > > > didn't find anything. Do I need to run some build scripts first
> > > > (which downloads additional sourcecode)? Or is there some symbol
> > > > mangling going on and that's why I don't find anything? Pointers very
> > much appreciated.
> > >
> > > The bulk of the OCL runtime code can be found inside
> > > https://github.com/Xilinx/XRT/tree/master/src/runtime_src/xocl.
> > > The OCL runtime also includes
> > https://github.com/Xilinx/XRT/tree/master/src/runtime_src/xrt.
> > > The OCL runtime library called libxilinxopencl.so in turn then uses XRT APIs
> > to talk to the drivers.
> > > For PCIe these XRT APIs are implemented in the library libxrt_core.so
> > > the source for which is
> > https://github.com/Xilinx/XRT/tree/master/src/runtime_src/driver/xclng/xrt.
> > >
> > > You can build a fully functioning runtime stack by following very
> > > simple build instructions--
> > > https://xilinx.github.io/XRT/master/html/build.html
> > >
> > > We do have a few dependencies on standard Linux packages including a
> > > few OpenCL packages bundled by Linux distros: ocl-icd, ocl-icd-devel
> > > and opencl-headers
> >
> > Thanks a lot for pointers. No idea why I didn't find this stuff, I guess I was
> > blind.
> >
> > The thing I'm really interested in is the compiler, since at least the experience
> > from gpus says that very much is part of the overall uapi, and definitely
> > needed to be able to make any chances to the implementation.
> > Looking at clCreateProgramWithSource there's only a lookup up cached
> > compiles (it looks for xclbin), and src/runtime_src/xclbin doesn't look like that
> > provides a compiler either. It seems like apps need to precompile everything
> > first. Am I again missing something, or is this how it's supposed to work?
> >
> XRT works with precompiled binaries which are compiled by Xilinx SDx compiler
> called xocc. The binary (xclbin) is loaded by clCreateProgramWithBinary().
>
> > Note: There's no expectation for the fully optimizing compiler, and we're
> > totally ok if there's an optimizing proprietary compiler and a basic open one
> > (amd, and bunch of other companies all have such dual stacks running on top
> > of drm kernel drivers). But a basic compiler that can convert basic kernels into
> > machine code is expected.
> >
> Although the compiler is not open source the compilation flow lets users examine
> output from various stages. For example if you write your kernel in OpenCL/C/C++
> you can view the RTL (Verilog/VHDL) output produced by first stage of compilation.
> Note that the compiler is really generating a custom circuit given a high level
> input which in the last phase gets synthesized into bitstream. Expert hardware
> designers can handcraft a circuit in RTL and feed it to the compiler. Our FPGA tools
> let you view the generated hardware design, the register map, etc. You can get more
> information about a compiled design by running XRT tool like xclbinutil on the
> generated file.
>
> In essence compiling for FPGAs is quite different than compiling for GPU/CPU/DSP.
> Interestingly FPGA compilers can run anywhere from 30 mins to a few hours to
> compile a testcase.

So is there any open source userspace generator for what this
interface provides? Is the bitstream format that gets fed into the
FPGA proprietary and is it signed?

Dave.