RE: About upstreaming ArmChina NPU driver

From: Dejia Shang
Date: Wed Apr 03 2024 - 06:25:40 EST



> -----Original Message-----
> From: Oded Gabbay <oded.gabbay@xxxxxxxxx>
> Sent: 2024年4月3日 14:26
> To: Dejia Shang <Dejia.Shang@xxxxxxxxxxxx>
> Cc: ogabbay@xxxxxxxxxx; airlied@xxxxxxxxxx; daniel@xxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx;
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: About upstreaming ArmChina NPU driver
>
> On Thu, Mar 28, 2024 at 10:01 AM Dejia Shang <Dejia.Shang@xxxxxxxxxxxx>
> wrote:
> >
> > Dear Kernel Maintainers,
> >
> > I am a driver developer and would like to upstream the ArmChina Zhouyi
> NPU driver ("Zhouyi" is the brand) to accel subsystem.
> >
> > The driver is already open sourced (both UMD and KMD) and anyone can
> find the code from https://github.com/Arm-China/Compass_NPU_Driver.git.
> >
> > This driver is responsible for scheduling AI inference tasks to the NPU cores
> (V1/V2/V3). Specifically, a simplified end-to-end flow is:
> >
> > 1. A TFLite/ONNX model is transformed to an executable binary
> file in ELF format by the NN graph compiler (designed by ArmChina)
> > 2. An application loads the executable binary file to UMD and
> provides the input data.
> > 3. UMD parses the binary and sends ioctls to KMD (open device,
> do memory allocation/mmap/free, submit the job descriptor).
> > 4. KMD dispatches the job to NPU h/w, handles interrupts and
> updates the execution status.
> > 5. UMD polls the status of the pre-scheduled job.
> > 6. The application gets the output results.
> >
> > So...for the upstreaming,
> >
> > Q1: do you think our NPU driver is suitable for accel? If the answer is yes,
> which tree & branch should the patches be based on?
> Hi Dejia,
> Yes, it definitely sounds as a good fit to the accel subsystem.
> Please base your patches on "drm-misc-next" branch in drm-misc repo:
> https://anongit.freedesktop.org/git/drm/drm-misc.git
>

Hi Oded,
Got it.

> >
> > Q2: in thread
> https://lore.kernel.org/lkml/ec547d33-214f-4952-aa33-c271e9edad63@kern
> el.org/ showing a similar case, Oded mentioned that:
> >
> > "If we would have upstreamed a new driver, the expectation
> would have been that we would use some drm mechanisms.", and
> > "the minimal requirement is to use GEM/BOs for memory
> management operations".
> >
> > I guess those requirements are also applicable for the Zhouyi NPU KMD?
> Currently, the memory management (MM) in KMD is based on dma-mapping
> APIs, which handles both reserved CMA region(s) and SMMU mapped buffers,
> and supports the dma-buf framework. Maybe I should replace the
> implementations with DRM APIs.
> Yes, those requirements definitely apply here.
> >
> > Q3: if you have looked at the KMD code, do you think I should make any
> other major change before submitting the first patch series? Thank you!
> I took a quick glance. In general, it seems to be ok, but I noticed two things
> related to the integration with drm/accel:
>
> 1. You us a scheduler for the job submission, which provides the ability to
> defer jobs. In that case, I suggest to check if you can use drm_sched instead of
> your own implementation. No point in re-inventing the wheel.
> 2. You provide several memory zones for allocation of memory. I would
> suggest here to look at using ttm as the memory manager instead of
> re-implementing your own.

Thanks for your time! I will try to refactor the code as suggested and then send the first patch series.

>
> And please remove the IMPORTANT NOTICE at the end of your emails. I
> would have to refrain from answering to further emails if that notice remains.

Now fixed. I did not realize that because the server auto appended the notice. Sorry for the inconvenience.

Best Regards,
Dejia

>
> Thanks,
> Oded
>
> >
> > Thanks for your time and look forward to your reply~ 😊
> >
> > Best Regards,
> > Dejia