Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing accelerator driver

From: Hans Verkuil
Date: Wed Jun 29 2022 - 09:44:38 EST


My apologies for the late reply...

On 01/06/2022 03:40, yuji2.ishikawa@xxxxxxxxxxxxx wrote:
> Hi Hans,
>
> Thank you for your advice.
> I prepared some description of DNN accelerator and its usage.
>
> #### Handling memory blocks for Visconti5 accelerators
>
> Visconti5 Image-Processing-Accelerators do not have fine grained IOMMU, as CPU have.
> Therefore, memory region to be passed to the accelerators should be physically contiguous.
> We use DMA-BUF backed by CMA (Contiguous Memory Allocator) to allocate memory regions for sharing between CPU/IPAs.
> Originally, in v4.19 based implementation, the ION allocator was used to allocate DMA-BUF instances.
> For the latest implementation, DMA-BUF HEAPS is used.
>
> Two structure types are used to represent memory region passed to drivers.
> * struct drv_ipa_buffer_info
> * to describe whole DMA-BUF instance
> * struct drv_ipa_addr
> * to describe a memory region in a DMA-BUF instance
>
> for details, see usage sample of each IPA driver
>
>
> #### Image Processing Accelerators overview
>
> Visconti5 SoC has following image processing accererators
>
> * AFFINE: 1 input image, 1 output image; Affine transform, Homography transform, Polynomial lens distortion, LUT transform
> * DNN: N input feature vector, N output feature vector; Deep neural network operation
> * PYRAMID 3 input image, 3 * N output image; Resize grayscale/color image with N different parameters
> * DSPIF: M input image, N output image; Various opeations on images
> * HOX: 1 input image (multi ROI), 1 input dictionary1 likelihood/feature vector; Extended Histogram of Oriented Gradient based pattern matching
> * HAMAT: 2 input feature vectors: 1 output corrdinate vector; Hamming distance matching for stereo vision
> * FLMAT: 3 input image, N input feature point, N output matched point; Optical flow matching
> * SMLDB: 1 input image, N input feature point, N output feature vector; Accelerated-KAZE feature descriptor accelerator
> * STMAT: 2 input image, 1 output disparity image; Stereo disparity

It's not really easy to decide what is best. I would say that if both input and output
are images (RGB, YUV, Grayscale), then a V4L2 memory-to-memory driver is what I would
expect to see.

Where that is not the case, then a more custom approach makes sense.

In the list above I would put AFFINE, PYRAMID, DSPIF and possible STMAT in the V4L2
driver group, and the others more as custom drivers.

I think it also depends on how it is used: if a captured sensor image is
typically passed in for further processing, i.e. it is closely related to the
video ISP, then V4L2 is a reasonable choice.

A DNN driver, on the other hand, isn't using images at all, so for that something
like this driver makes sense.

Regards,

Hans

>
> see [0] Fig 7.2.1 for block diagram (of prototype chip)
>
>
> #### DNN accelerator overview
>
> DNN accelerator is a proprietary CNN/DCNN processing accelerator developed by Toshiba.
> Visconti5 SoC has 2 instances of DNN acclerator hardware.
> Users convert existing Caffe/ONNX models to Visconti compatible models with an offline tool.
> A converted model "Configuration Binary" includes:
> * instruction sequence for given network
> * weight/bias information
> * DMA configuration from/to global memory (for input/output feature)
>
> DNN acccelerator can handle either 1 plane or multiple ROIs at a single call.
>
> see [0] Fig 7.2.2 for block diagram of DNN accelerator
>
> CNN: Convolutional Neural Network
> DCNN: Deep Convolutional Neural Network
>
>
> #### Input / Output
>
> Input image or feature: base type is either of FP16, FP32, INT8, UINT8, INT16
> Output feature vector: base type is either of FP16, FP32, INT8, UINT8, INT16
>
> Input, Output, Weight, Bias can be placed on global memory and loaded/stored with DMA within DNN accelerator.
> These data on global memory can be specified as either of:
> * single address to point single data block
> * list of address to point multiple data blocks (i.e. ROIs)
>
> DNN acclerator driver accepts an instance of "struct drv_dnn_descriptor" which includes addresses of input/output features and a configuration binary.
>
>
> #### Descriptor Builder at userland
>
> Following APIs are provided to build a descriptor instance at userland.
>
> /* defined in drv_dnn_util.h */
> int32_t drv_DNN_config_descript_init(struct drv_dnn_descriptor *desc, struct drv_ipa_buffer_info *buffer, int32_t buffer_num);
> int32_t drv_DNN_config_exec_configuration(struct drv_dnn_descriptor *desc, const void *configuration_binary,
> struct drv_ipa_addr configuration_binary_addr, struct drv_ipa_addr *src_list,
> struct drv_ipa_addr *dst_list, int32_t list_num, struct drv_ipa_addr temporary_addr,
> int32_t temporary_size);
> int32_t drv_DNN_config_descript_finalize(struct drv_dnn_descriptor *desc);
>
> struct drv_dnn_descriptor is defined in drivers/soc/visconti/uapi/dnn.h.
> I think this header should be placed anywhere else to be collected on "make headers_install" action of kernel building.
>
>
> #### Usage sample (without error handlers)
>
> #include <linux/dma-heap.h>
> #include "drv_ipa.h"
> #include "drv_dnn.h"
> #include "drv_dnn_util.h"
>
> int allocate_buffer(int fd_heap, int size)
> {
> struct dma_heap_allocation_data heap_data_in={0};
> int ret;
>
> heap_data_in.len = ROUNDUP_POW2(size);
> heap_data_in.fd_flags = O_RDWR | O_CLOEXEC;
>
> ret = ioctl(fd_heap, DMA_HEAP_IOCTL_ALLOC, &heap_data_in);
> if (ret <0)
> return -1;
> else
> return heap_data_in.fd;
> }
>
> void dnn_sample(int fd_dnn, int fd_conf, int fd_src, int fd_dst, int fd_temp)
> {
> int32_t ret;
> struct drv_ipa_buffer_info bufinfo[4] = {
> {.fd=fd_conf, .coherent=true, .direction=DRV_IPA_DIR_TO_DEVICE},
> {.fd=fd_src, .coherent=true, .direction=DRV_IPA_DIR_TO_DEVICE},
> {.fd=fd_dst, .coherent=true, .direction=DRV_IPA_DIR_FROM_DEVICE},
> {.fd=fd_temp, .coherent=true, .direction=DRV_IPA_DIR_FROM_DEVICE},
> };
> struct drv_ipa_addr conf_addr = {.buffer_index=0, .offset=0};
> struct drv_ipa_addr src_addr = {.buffer_index=1, .offset=0};
> struct drv_ipa_addr dst_addr = {.buffer_index=2, .offset=0};
> struct drv_ipa_addr temp_addr = {.buffer_index=3, .offset=0};
> struct drv_dnn_descriptor desc;
>
> struct drv_ipa_addr src_list[] = {src_addr};
> struct drv_ipa_addr dst_list[] = {dst_addr};
>
> uint8_t *config = (uint8_t*)mmap(NULL, DNN_CONF_BIN_SIZE, PROT_READ, MAP_SHARED, fd_conf, 0);
>
> drv_DNN_config_descript_init(&desc, bufinfo, 4);
> drv_DNN_config_exec_configuration(&desc, config, conf_addr, src_list, dst_list, 1, temp_addr, TEMP_BUF_SIZE);
> drv_DNN_config_descript_finalize(&desc);
>
> ioctl(fd_dnn, IOC_IPA_START, &desc);
>
> {
> struct pollfd fds[] = {.fd=fd_dnn, .events=POLL_IN, .revents=0};
> poll(fds, 1, 1000);
> }
> }
>
> void sample()
> {
> int fd_dnn, fd_heap, fd_conf, fd_src, fd_dst, fd_temp;
>
> fd_dnn = open("/dev/dnn0", O_RDWR);
> fd_heap = open("/dev/dma_heap/linux,cma", O_RDWR);
> fd_conf = allocate_buffer(fd_heap, DNN_CONF_BIN_ALLOC_SIZE);
> fd_src = allocate_buffer(fd_heap, INPUT_IMG_ALLOC_SIZE);
> fd_dst = allocate_buffer(fd_heap, OUTPUT_IMG_ALLOC_SIZE);
> fd_temp = allocate_buffer(fd_heap, TEMP_BUF_ALLOC_SIZE);
>
> /* fill in input image and configuration here */
>
> dnn_sample(fd_dnn, fd_conf, fd_src, fd_dst, fd_temp);
>
> ...
> };
>
>
> #### Reference
>
> * [0] https://toshiba.semicon-storage.com/content/dam/toshiba-ss-v2/master/en/company/technical-review/pdf/technical-review-18_e.pdf
> * Fig 7.2.1 shows the whole architecture of prototype chip
> * Fig 7.2.2 shows the architecture of DNN accelerator
>
>
> Regards,
> Yuji
>
>> -----Original Message-----
>> From: Hans Verkuil <hverkuil@xxxxxxxxx>
>> Sent: Friday, May 20, 2022 7:03 PM
>> To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開)
>> <yuji2.ishikawa@xxxxxxxxxxxxx>; robh+dt@xxxxxxxxxx; iwamatsu nobuhiro(岩松
>> 信洋 □SWC◯ACT) <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>;
>> sumit.semwal@xxxxxxxxxx; christian.koenig@xxxxxxx
>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>> linux-media@xxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx;
>> linaro-mm-sig@xxxxxxxxxxxxxxxx
>> Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing
>> accelerator driver
>>
>> Hi Yuji,
>>
>> On 5/20/22 11:48, yuji2.ishikawa@xxxxxxxxxxxxx wrote:
>>> Hi Hans,
>>>
>>> Thank you for your comment.
>>> I agree that this submission lacks documents sharing basic idea of the
>> accelerators; what do they accept and what do they yield.
>>> Where can I put a new document? Can I put it as a comment in a source? Can
>> I add a file under Documentation/misc-devices directory?
>>
>> Start with explaining it by replying to this mail. Without knowing anything about
>> the hardware, it is difficult to say what the best place is. Usually it is either the
>> public API header, or somewhere in Documentation.
>>
>> The first step is to have a better understanding of the Visconti image hardware
>> and to see what the best subsystem would be to support that hardware.
>>
>> Regards,
>>
>> Hans
>>
>>>
>>> Thanks,
>>> Yuji Ishikawa
>>>
>>>> -----Original Message-----
>>>> From: Hans Verkuil <hverkuil@xxxxxxxxx>
>>>> Sent: Thursday, May 12, 2022 8:15 PM
>>>> To: ishikawa yuji(石川 悠司 ○RDC□AITC○EA開)
>>>> <yuji2.ishikawa@xxxxxxxxxxxxx>; Rob Herring <robh+dt@xxxxxxxxxx>;
>>>> iwamatsu nobuhiro(岩松 信洋 □SWC◯ACT)
>>>> <nobuhiro1.iwamatsu@xxxxxxxxxxxxx>; Sumit Semwal
>>>> <sumit.semwal@xxxxxxxxxx>; Christian König
>> <christian.koenig@xxxxxxx>
>>>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
>>>> linux-kernel@xxxxxxxxxxxxxxx; linux-media@xxxxxxxxxxxxxxx;
>>>> dri-devel@xxxxxxxxxxxxxxxxxxxxx; linaro-mm-sig@xxxxxxxxxxxxxxxx
>>>> Subject: Re: [PATCH 0/4] Add Toshiba Visconti DNN image processing
>>>> accelerator driver
>>>>
>>>> Hi Yuji,
>>>>
>>>> On 4/28/22 15:11, Yuji Ishikawa wrote:
>>>>> This series is the DNN image processing accelerator driver for
>>>>> Toshiba's ARM
>>>> SoC, Visconti[0].
>>>>> This provides DT binding documentation, device driver, MAINTAINER
>> files.
>>>>>
>>>>> The second patch "soc: visconti: Add Toshiba Visconti image
>>>>> processing
>>>> accelerator common source"
>>>>> and the fourth patch "MAINTAINERS: ..." are the same as the ones in
>>>>> the
>>>> preceding post for affine driver.
>>>>
>>>> There appears to be no documentation whatsoever, unless I am missing
>>>> something.
>>>>
>>>> How is the uAPI supposed to be used? What does it do? What formats
>>>> does it accept or produce?
>>>>
>>>> If this processes images, then (as Laurent mentioned) this is more
>>>> suitable as a
>>>> V4L2 mem2mem driver.
>>>>
>>>> See
>>>> https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dev-
>>>> me
>>>> m2mem.html
>>>> and the many drivers in drivers/media that use it (git grep
>> v4l2-mem2mem.h).
>>>>
>>>> But without any explanation whatsoever I have no idea what does or
>>>> does not make sense.
>>>>
>>>> Regards,
>>>>
>>>> Hans
>>>>
>>>>>
>>>>> Best regards,
>>>>> Yuji
>>>>>
>>>>> [0]:
>>>>>
>>>>
>> https://toshiba.semicon-storage.com/ap-en/semiconductor/product/image
>>>> -
>>>>> recognition-processors-visconti.html
>>>>>
>>>>> Yuji Ishikawa (4):
>>>>> dt-bindings: soc: visconti: Add Toshiba Visconti DNN image processing
>>>>> accelerator bindings
>>>>> soc: visconti: Add Toshiba Visconti image processing accelerator
>>>>> common source
>>>>> soc: visconti: Add Toshiba Visconti DNN image processing accelerator
>>>>> MAINTAINERS: Add entries for Toshiba Visconti DNN image processing
>>>>> accelerator
>>>>>
>>>>> .../soc/visconti/toshiba,visconti-dnn.yaml | 54 ++
>>>>> MAINTAINERS | 2 +
>>>>> drivers/soc/Kconfig | 1 +
>>>>> drivers/soc/Makefile | 1 +
>>>>> drivers/soc/visconti/Kconfig | 7 +
>>>>> drivers/soc/visconti/Makefile | 8 +
>>>>> drivers/soc/visconti/dnn/Makefile | 6 +
>>>>> drivers/soc/visconti/dnn/dnn.c | 533
>>>> ++++++++++++++++++
>>>>> drivers/soc/visconti/dnn/hwd_dnn.c | 183 ++++++
>>>>> drivers/soc/visconti/dnn/hwd_dnn.h | 68 +++
>>>>> drivers/soc/visconti/dnn/hwd_dnn_reg.h | 228 ++++++++
>>>>> drivers/soc/visconti/ipa_common.c | 55 ++
>>>>> drivers/soc/visconti/ipa_common.h | 18 +
>>>>> drivers/soc/visconti/uapi/dnn.h | 77 +++
>>>>> drivers/soc/visconti/uapi/ipa.h | 88 +++
>>>>> 15 files changed, 1329 insertions(+) create mode 100644
>>>>> Documentation/devicetree/bindings/soc/visconti/toshiba,visconti-dnn.
>>>>> ya ml create mode 100644 drivers/soc/visconti/Kconfig create mode
>>>>> 100644 drivers/soc/visconti/Makefile create mode 100644
>>>>> drivers/soc/visconti/dnn/Makefile create mode 100644
>>>>> drivers/soc/visconti/dnn/dnn.c create mode 100644
>>>>> drivers/soc/visconti/dnn/hwd_dnn.c
>>>>> create mode 100644 drivers/soc/visconti/dnn/hwd_dnn.h
>>>>> create mode 100644 drivers/soc/visconti/dnn/hwd_dnn_reg.h
>>>>> create mode 100644 drivers/soc/visconti/ipa_common.c create mode
>>>>> 100644 drivers/soc/visconti/ipa_common.h create mode 100644
>>>>> drivers/soc/visconti/uapi/dnn.h create mode 100644
>>>>> drivers/soc/visconti/uapi/ipa.h
>>>>>