On Wed, May 24, 2023 at 01:27:00PM +0300, Oded Gabbay wrote:What is letting you think that we want to bypass open source requirements ?
On Wed, May 24, 2023 at 2:34 AM Kevin Hilman <khilman@xxxxxxxxxxxx> wrote:
Maybe model it according to the tiny driver in drm display ? You can
Jeffrey Hugo <quic_jhugo@xxxxxxxxxxx> writes:
On 5/17/2023 8:52 AM, Alexandre Bailon wrote:
This adds a DRM driver that implements communication between the CPU and an
APU. The driver target embedded device that usually run inference using some
prebuilt models. The goal is to provide common infrastructure that could be
re-used to support many accelerators. Both kernel, userspace and firmware tries
to use standard and existing to leverage the development and maintenance effort.
The series implements two platform drivers, one for simulation and another one for
the mt8183 (compatible with mt8365).
This looks like the 3 existing Accel drivers. Why is this in DRM?
Yes, this belongs in accel. I think Alex had some issues around the
infra in accel with device nodes not appearing/opening properly, but
I'll let him comment there. But either way, the right approach should
be to fix any issues in accel and move it there.
[...]
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 ++
drivers/gpu/drm/Kconfig | 2 +
drivers/gpu/drm/Makefile | 1 +
drivers/gpu/drm/apu/Kconfig | 22 +
drivers/gpu/drm/apu/Makefile | 10 +
drivers/gpu/drm/apu/apu_drv.c | 282 +++++++++
drivers/gpu/drm/apu/apu_gem.c | 230 +++++++
drivers/gpu/drm/apu/apu_internal.h | 205 ++++++
drivers/gpu/drm/apu/apu_sched.c | 592 ++++++++++++++++++
drivers/gpu/drm/apu/simu_apu.c | 313 +++++++++
include/uapi/drm/apu_drm.h | 81 +++
"apu" seems too generic. We already have 3 "AI processing units" over
in drivers/accel already...
Indeed, it is generic, but that's kind of the point for this driver
since it's targetted at generalizing the interface with "AI processing
units" on a growing number of embedded SoCs (ARM, RISC-V, etc.) In
addition, the generic naming is intentional because the goal is bigger
than the kernel and is working towards a generic, shared "libAPU"
userspace[1], but also common firmware for DSP-style inference engines
(e.g. analgous Sound Open Firmware for audio DSPs.)
As usual, the various SoC vendors use different names (APU, NPU, NN
unit, etc.) but we'd like a generic name for the class of devices
targetted by this driver. And unfortunately, it looks like the equally
generic "Versatile processing unit" is already taken Intel's
drivers/accel/ivpu. :)
Maybe since this is more about generalizing the interface between the
CPU running linux and the APU, what about the name apu_if? But I guess
that applies to the other 3 drivers in drivers/accell also. Hmmm...
Naming things is hard[2], so we're definitly open to other ideas. Any
suggestions?
then call it tiny_apu :-)
Disclosure: It was Daniel's suggestion, he can chime in with more
details on the tiny driver concept.
Yeah so maybe a bit more detail on my thoughts:
First this smells like a need bypass of the entire "we want open userspace
for accel drivers" rule. The rule isn't quite a strict as for drm gpu
drivers (not sure we ended up documenting exactly what, but iirc the
consensus was that for build-time only dependencies we're ok with
downstream compilers), but it's still there.
To be honest, I don't know what would be required for training engines.
And at least from a quick look apu.ko and libapu just look like a generic
accel interface, and that's not enough.
For the big training engines it's more or less "enough to run pytorch, but
it can be really slow", not sure what the right standard for these
inference-only drivers should be.
This makes sense to me.
So that's the first reason why I don't like this.
The other is that I think if we do end up with a pile of tiny accel
drivers, we should probably look into something like simmpledrm for the
tiny display drivers. Probably still IP specific ioctls (at least most) so
that IP specific job knows and all that are easy, but then just pass to a
framework that simplifies a drm gem driver to "write ptes" and "run job"
callback, maybe with an optional "create/destroy vm/ctx" for hw which can
do that.
So maybe we end up with a drivers/accel/tiny and a bunch more helpers
around the existing gem ones. The rule we have for drm/tiny is "1 file,
less than 1kloc", and there's a bunch of them. I do think we can achieve
the same for tiny accel inference engines (but it's still a bit a road).
Maybe tiny accel is more like "less than 5kloc" since you need a bit more
glue for the driver specific ioctl stuff - maybe that's only needed for
the submit ioctl, maybe also for buffer map/unmap and creation.
I wrote this series a long time ago and just rebased it recently.
Also note that there's an entire pile of in-flight work for adding new
helpers to the gem world to make this all easier. Once we have gpuva and
exec helpers there not much glue left to tie it all together with the
scheduler.
We are currently stuck with closed source fimrware, userspace applications and toolchains (works with android and linux).
But the real crux is that an accel inference driver really needs to have
enough userspace to do an actual inference job with some
android/cros/whatever framework for inference (there's just too many).
-Daniel
Oded
Kevin
[1] https://gitlab.baylibre.com/baylibre/libapu/libapu
[2]
"There are 2 hard problems in computer science: cache invalidation,
naming things and off-by-1 errors."
-- https://twitter.com/secretGeek/status/7269997868