Re: [PATCH Xilinx Alveo 1/8] Documentation: fpga: Add a document describing Alveo XRT drivers

From: Moritz Fischer
Date: Wed Dec 02 2020 - 23:38:07 EST


Max,

On Thu, Dec 03, 2020 at 03:38:26AM +0000, Max Zhen wrote:
> [...cut...]
>
> > > > > +xclbin over the User partition as part of DFX. When a user
> > > > > +requests loading of a specific xclbin the xmgmt management
> > > > > +driver reads the parent interface UUID specified in the xclbin
> > > > > +and matches it with child interface UUID exported by Shell to
> > > > > +determine if xclbin is compatible
> > > > with the Shell. If match fails loading of xclbin is denied.
> > > > > +
> > > > > +xclbin loading is requested using ICAP_DOWNLOAD_AXLF ioctl
> > command.
> > > > > +When loading xclbin xmgmt driver performs the following operations:
> > > > > +
> > > > > +1. Sanity check the xclbin contents 2. Isolate the User
> > > > > +partition 3. Download the bitstream using the FPGA config engine (ICAP) 4.
> > > > > +De-isolate the User partition
> > > > Is this modelled as bridges and regions?
> > >
> > > Alveo drivers as written today do not use fpga bridge and region
> > framework. It seems that if we add support for that framework, it’s
> > possible to receive PR program request from kernel outside of xmgmt driver?
> > Currently, we can’t support this and PR program can only be initiated
> > using XRT’s runtime API in user space.
> >
> > I'm not 100% sure I understand the concern here, let me reply to what
> > I think I understand:
> >
> > You're worried that if you use FPGA region as interface to accept PR
> > requests something else could attempt to reconfigure the region from
> > within the kernel using the FPGA Region API?
> >
> > Assuming I got this right, I don't think this is a big deal. When you
> > create the regions you control who gets the references to it.
>
> Thanks for explaining. Yes, I think you got my point :-).

We can add code to make a region 'static' or 'one-time' or 'fixed'.
>
> >
> > From what I've seen so far Regions seem to be roughly equivalent to
> > Partitions, hence my surprise to see a new structure bypassing them.
>
> I see where the gap is.
>
> Regions in Linux is very different than "partitions" we have defined in xmgmt. Regions seem to be a software data structure representing an area on the FPGA that can be reprogrammed. This area is protected by the concept of "bridge" which can be disabled before program and reenabled after it. And you go through region when you need to reprogram this area.

Your central management driver can create / destroy regions at will. It
can keep them in a list, array or tree.

Regions can but don't have to have bridges.

If you need to go through the central driver to reprogram a region,
you can use that to figure out which region to program.
>
> The "partition" is part of the main infrastructure of xmgmt driver, which represents a group of subdev drivers for each individual IP (HW subcomponents). Basically, xmgmt root driver is parent of several partitions who is, in turn, the parent of several subdev drivers. The parent manages the life cycle of its children here.

I don't see how this is conceptually different from what DFL does, and
they managed to use Regions and Bridges.

If things are missing in the framework, please add them instead of
rewriting an entire parallel framework.

>
> We do have a partition to represent the group of subdevs/IPs in the reprogrammable area. And we also have partitions representing other areas which cannot be reprogrammed. So, it is difficult to use "Region" to implement "partition".

You implement your regions callbacks, you can return -EINVAL / -ENOTTY
if you want to fail a reprogramming request to a static partion /
region.

> From what you have explained, it seems that even if I use region / bridge in xmgmt, we can still keep it private to xmgmt instead of exposing the interface to outside world, which we can't support anyway? This means that region will be used as an internal data structure for xmgmt. Since we can't simply replace partition with region, we might as well just use partition throughout the driver, instead of introducing two data structures and use them both in different places.

Think about your partition as an extension to a region that implements
what you need to do for your case of enumerating and reprogramming that
particular piece of your chip.

> However, if using region/bridge can bring in other benefits, please let us know and we could see if we can also add this to xmgmt.

As maintainer I can say it brings the benefit of looking like existing
infrastructure we have. We can add features to the framework as needed
but blanket replacing the entire thing is always a hard sell.
>
> > >
> > > Or maybe we have missed some points about the use case for this
> > framework?
> > >
>
> [...cut...]
>
> > > > > +-----------------
> > > > > +
> > > > > +As mentioned previously xsabin stores metadata which advertise
> > > > > +HW
> > > > subsystems present in a partition.
> > > > > +The metadata is stored in device tree format with well defined
> > > > > +schema. Subsystem instantiations are captured as children of
> > > > > +``addressable_endpoints`` node. Subsystem nodes have standard
> > > > attributes like ``reg``, ``interrupts`` etc. Additionally the
> > > > nodes also have PCIe specific attributes:
> > > > > +``pcie_physical_function`` and ``pcie_bar_mapping``. These
> > > > > +identify which PCIe physical function and which BAR space in
> > > > > +that physical function the subsystem resides. XRT management
> > > > > +driver uses this information to bind *platform drivers* to the
> > > > > +subsystem instantiations. The platform drivers are found in
> > > > > +**xrt-lib.ko** kernel module defined later. Below is an example
> > > > > +of device tree for Alveo U50
> > > > > +platform::
> > > >
> > > > I might be missing something, but couldn't you structure the
> > > > addressable endpoints in a way that encode the physical function
> > > > as a parent / child relation?
> > >
> > > Alveo driver does not generate the metadata. The metadata is
> > > formatted
> > and generated by HW tools when the Alveo HW platform is built.
> >
> > Sure, but you control the tools that generate the metadata :) Your
> > userland can structure / process it however it wants / needs?
>
> XRT is a runtime software stack, it is not responsible for generating HW metadata. It is one of the consumers of these data. The shell design is generated by a sophisticated tool framework which is difficult to change.

The Kernel userspace ABI is not going to change once it is merged, which
is why we need to get it right. You can change your userspace code long
time after it is merged into the kernel. The otherway round does not
work.

If you're going to do device-tree you'll need device-tree maintainers to
be ok with your bindings.

> However, we will take this as a feedback for future revision of the tool.
>
> Thanks,
> Max

Btw: Can you fix your line-breaks :)

- Moritz