Re: [RFC PATCH v3 4/6] media: tegra: Add Tegra210 Video input driver

From: Sowjanya Komatineni
Date: Wed Feb 19 2020 - 11:22:22 EST



On 2/19/20 7:08 AM, Hans Verkuil wrote:
External email: Use caution opening links or attachments


On 2/18/20 4:19 AM, Sowjanya Komatineni wrote:
On 2/17/20 5:04 PM, Sowjanya Komatineni wrote:
On 2/17/20 4:59 PM, Sowjanya Komatineni wrote:
On 2/17/20 12:04 AM, Hans Verkuil wrote:
External email: Use caution opening links or attachments


On 2/16/20 9:22 PM, Sowjanya Komatineni wrote:
On 2/16/20 12:11 PM, Sowjanya Komatineni wrote:
On 2/16/20 11:54 AM, Sowjanya Komatineni wrote:
On 2/16/20 3:03 AM, Hans Verkuil wrote:
External email: Use caution opening links or attachments


On 2/14/20 7:23 PM, Sowjanya Komatineni wrote:
Tegra210 contains a powerful Video Input (VI) hardware controller
which can support up to 6 MIPI CSI camera sensors.

Each Tegra CSI port can be one-to-one mapped to VI channel and can
capture from an external camera sensor connected to CSI or from
built-in test pattern generator.

Tegra210 supports built-in test pattern generator from CSI to VI.

This patch adds a V4L2 media controller and capture driver support
for Tegra210 built-in CSI to VI test pattern generator.

Signed-off-by: Sowjanya Komatineni <skomatineni@xxxxxxxxxx>
---
drivers/staging/media/Kconfig |ÂÂÂ 2 +
drivers/staging/media/Makefile |ÂÂÂ 1 +
drivers/staging/media/tegra/Kconfig |ÂÂ 10 +
drivers/staging/media/tegra/Makefile |ÂÂÂ 8 +
drivers/staging/media/tegra/TODO |ÂÂ 10 +
drivers/staging/media/tegra/tegra-common.h |Â 239 +++++++
drivers/staging/media/tegra/tegra-csi.c |Â 374 ++++++++++
drivers/staging/media/tegra/tegra-csi.h |Â 115 ++++
drivers/staging/media/tegra/tegra-vi.c | 1019
++++++++++++++++++++++++++++
drivers/staging/media/tegra/tegra-vi.h |ÂÂ 79 +++
drivers/staging/media/tegra/tegra-video.c |Â 118 ++++
drivers/staging/media/tegra/tegra-video.h |ÂÂ 32 +
drivers/staging/media/tegra/tegra210.c |Â 767
+++++++++++++++++++++
drivers/staging/media/tegra/tegra210.h |Â 190 ++++++
14 files changed, 2964 insertions(+)
create mode 100644 drivers/staging/media/tegra/Kconfig
create mode 100644 drivers/staging/media/tegra/Makefile
create mode 100644 drivers/staging/media/tegra/TODO
create mode 100644 drivers/staging/media/tegra/tegra-common.h
create mode 100644 drivers/staging/media/tegra/tegra-csi.c
create mode 100644 drivers/staging/media/tegra/tegra-csi.h
create mode 100644 drivers/staging/media/tegra/tegra-vi.c
create mode 100644 drivers/staging/media/tegra/tegra-vi.h
create mode 100644 drivers/staging/media/tegra/tegra-video.c
create mode 100644 drivers/staging/media/tegra/tegra-video.h
create mode 100644 drivers/staging/media/tegra/tegra210.c
create mode 100644 drivers/staging/media/tegra/tegra210.h

<snip>

+/*
+ * videobuf2 queue operations
+ */
+static int tegra_channel_queue_setup(struct vb2_queue *vq,
+ unsigned int *nbuffers,
+ unsigned int *nplanes,
+ unsigned int sizes[],
+ struct device *alloc_devs[])
+{
+ struct tegra_vi_channel *chan = vb2_get_drv_priv(vq);
+
+ if (*nplanes)
+ return sizes[0] < chan->format.sizeimage ?
-EINVAL : 0;
+
+ *nplanes = 1;
+ sizes[0] = chan->format.sizeimage;
+ alloc_devs[0] = chan->vi->dev;
+
+ /*
+ * allocate min 3 buffers in queue to avoid race between DMA
+ * writes and userspace reads.
+ */
+ if (*nbuffers < 3)
+ *nbuffers = 3;
First of all, don't check this here, instead set the struct
vb2_queue field
'min_buffers_needed' to 3 instead.

But the reason given for this check is peculiar: there should
not be
any
race at all. Usually the reason for requiring a specific minimum
number of
buffers is that the DMA engine needs at least 2 buffers before it
can start
streaming: it can't give back a buffer to userspace
(vb2_buffer_done())
unless there is a second buffer it can start to capture to next. So
for many
DMA implementations you need a minimum of 2 buffers: two buffers
for
the
DMA engine, one buffer being processed by userspace.

If the driver is starved of buffers it will typically keep
capturing to
the last buffer until a new buffer is queued.

In any case, once the driver releases a buffer via
vb2_buffer_done()
the
buffer memory is no longer owned by the driver.

To be precise, buffer ownership is as follows:

userspace -> VIDIOC_QBUF -> vb2 -> buf_queue -> driver ->
vb2_buffer_done() -> vb2 -> VIDIOC_DQBUF -> userspace

(vb2 == videobuf2 framework)

Note that vb2 never touches the buffer memory.

So if you get a race condition in this driver, then there is
something
strange going on. It looks like vb2_buffer_done() is called while
DMA is
still ongoing, or because the driver really needs to keep one
buffer
available at all times.

Regards,

Hans
Thanks Hans.

On running v4l2-compliance streaming tests for longer run, I noticed
kernel reporting unable to write to read-only memory and with debugs
I observed when this error was reported, I see 2 buffers queued and
both using same address.

for first buffer capture start thread initiates capture and wakes
done thread to wait for memory write ack and once its done buffer is
released to user space but I see upon buffer released to user space
immediate next buffer capture single shot gets issued (as soon as
single shot is issued frame capture data is written to memory by
DMA)
and I see this kernel error of unable to write to read-only memory.

This error happens rare and happens on long run and all the times of
repro's, I see when other thread releases buffer immediate I see
single shot gets issued as 2 buffers are queued up at the same time
with same DMA address.

Just to be clear, I meant all the times when kernel reports error
unable to write to read-only memory, I see 2 buffers gets queued and
as the capture start thread and done thread are parallel and when
capture thread wakes done thread on receiving FS event, done thread
for waiting for memory write happens parallel to next frame capture
and I see while vb2_buffer_done happens in done thread next frame
single shot has been issues by capture start thread in parallel when
it hits this error.
For low latency, we use 2 threads one thread for capture and wait
for FS
and on receiving FS even wakes other done thread to wait for memory
write to finish.

While other done thread waits for memory write to finish, capture
thread
can start capture for next frame and as soon as single shot is issued
capture frame is written to memory and as this thread runs in parallel
to done thread

there is a possibility vb2_buffer_done being called by
kthread_capture_done while DMA is ongoing by kthread_capture_start
and I
observed same DMA address being used got both buffers that got
queued at
same time when it hits this error.
"buffers that got queued": you mean that
tegra_channel_buffer_queue() is
called twice with different buffers (i.e. with different buffer
index values)
but with the same DMA address?
yes I see buf_queue invoked twice and both using same DMA address.

I did not monitored buf index when I repro'd with debugs.

That should not happen (unless the first buffer was returned with
vb2_buffer_done() before the second buffer was queued).

Can you provide more details? E.g. buffer index, memory model used when
streaming, total number of buffers allocated by REQBUFS.
memory type is MMAP. buffer count is 1. Didn't monitored buf index
but when this happened as count is 1 so I think it might be index 0.

But what I noticed is on running long hours rarely this repro's and
at the time of repro, I see 2 immediate buf_queue with same DMA address.

1st buffer capture starts and received FS and wakes other thread to
wait for MW_ACK_DONE and when capture thread executes for 2nd buffer
right as soon as single shot bit is set which writes captured data to
memory kernel error of unable to write to read-only memory happens.

I couldn't add more debugs to confirm if 1st buffer release finished
or not by the time 2nd buffer single shot is issued but I see
MW_ACK_DONE event received for 1st buffer.

Adding more debugs does not repro this kernel error and that could be
because delays with debug messages helps time interval b/w buffer
release and next buffer single shot.

I would like to fully understand this. Just increasing the minimum
number
of buffers, while reasonable by itself, *does* feel like it is just
hiding the symptoms.

Regards,

Hans
kthread for capture start and kthread for done which waiting for
memory write to happen runs in parallel.

Its hard to confirm if buffer done is invoked by kthread_done at same
time of kthread_capture DMA write of next frame as adding debugs has
impact on delay and doesnt repro with more debugs.

But just increasing no.of min buffers to 3 doesnt repro at all and I
never see 2 buffer queue requests with same DMA address as min
buffers are 3.

Will add debugs and monitor with index again and check for repro by
removing min buffers to re-confirm ...
Without specifying min buffers needed, I am able to repro again.

Noticed the buf that got released previously is the one that gets next
into the queue. So I don't see same buffer being used b/w captures.
Somehow I might have misinterpreted timing from logs earlier.

This time repro happened after streaming stop and on next streaming
buf_prepare call back (with in tegra_channel_buffer_prepare)

Earlier repro's call trace also showed all the time it happened in
tegra_channel_buffer_prepare but I noticed that happened when capture
start thread was running in parallel and right on single shot issued.

So, looks like I misunderstood that it happened b/w issuing single shot
and releasing buffer but could be its being reported during buffer
prepare itself.

All the times of repro, call trace shows it happens during buf_prepare.

[ÂÂ 41.213146] Unable to handle kernel write to read-only memory at
virtual address ffff0000f5c3fff8
This is weird: tegra_channel_buffer_prepare doesn't access any of the
actual buffer memory, it just reads data from structs like vb2_buffer, etc.

This indicates a memory overwrite or something similar. Enabling KASAN
in the kernel and poisoning freed memory caused the following (100% reproducible)
crash with 'v4l2-ctl --stream-mmap':

[ 135.698408] BUG: KASAN: slab-out-of-bounds in tegra_channel_buffer_prepare+0x5c/0x100
[ 135.706277] Write of size 8 at addr ffff0000c9cfdff8 by task v4l2-ctl/3030
[ 135.713179]
[ 135.714687] CPU: 2 PID: 3030 Comm: v4l2-ctl Tainted: G W 5.6.0-rc1-arm64 #48
[ 135.723076] Hardware name: NVIDIA Jetson TX1 Developer Kit (DT)
[ 135.729023] Call trace:
[ 135.731490] dump_backtrace+0x0/0x278
[ 135.735172] show_stack+0x14/0x20
[ 135.738510] dump_stack+0xe0/0x134
[ 135.741933] print_address_description.isra.0+0x68/0x348
[ 135.747270] __kasan_report+0x114/0x214
[ 135.751126] kasan_report+0xc/0x18
[ 135.754546] __asan_store8+0x94/0xb8
[ 135.758142] tegra_channel_buffer_prepare+0x5c/0x100
[ 135.763133] __buf_prepare+0x1fc/0x508
[ 135.766902] vb2_core_qbuf+0x484/0x7c8
[ 135.770672] vb2_qbuf+0x100/0x148
[ 135.774004] vb2_ioctl_qbuf+0x90/0xa8
[ 135.777688] v4l_qbuf+0x54/0x68
[ 135.780847] __video_do_ioctl+0x288/0x5a0
[ 135.784879] video_usercopy+0x1f0/0x640
[ 135.788735] video_ioctl2+0x14/0x1c
[ 135.792244] v4l2_ioctl+0x98/0xb0
[ 135.795580] ksys_ioctl+0x2fc/0x10c8
[ 135.799175] __arm64_sys_ioctl+0x40/0x58
[ 135.803120] do_el0_svc+0xb8/0x228
[ 135.806542] el0_sync_handler+0x14c/0x2a8
[ 135.810572] el0_sync+0x140/0x180
[ 135.813902]
[ 135.815403] Allocated by task 3028:
[ 135.818913] save_stack+0x24/0xb0
[ 135.822247] __kasan_kmalloc.isra.0+0xc0/0xe0
[ 135.826626] kasan_kmalloc+0xc/0x18
[ 135.830133] __kmalloc+0x198/0x2e8
[ 135.833554] __vb2_queue_alloc+0xa4/0x858
[ 135.837584] vb2_core_reqbufs+0x360/0x648
[ 135.841613] vb2_ioctl_reqbufs+0xe4/0x140
[ 135.845644] v4l_reqbufs+0x64/0x78
[ 135.849064] __video_do_ioctl+0x288/0x5a0
[ 135.853095] video_usercopy+0x1f0/0x640
[ 135.856952] video_ioctl2+0x14/0x1c
[ 135.860459] v4l2_ioctl+0x98/0xb0
[ 135.863793] ksys_ioctl+0x2fc/0x10c8
[ 135.867387] __arm64_sys_ioctl+0x40/0x58
[ 135.871331] do_el0_svc+0xb8/0x228
[ 135.874750] el0_sync_handler+0x14c/0x2a8
[ 135.878779] el0_sync+0x140/0x180
[ 135.882109]
[ 135.883609] Freed by task 3028:
[ 135.886767] save_stack+0x24/0xb0
[ 135.890099] __kasan_slab_free+0x108/0x180
[ 135.894215] kasan_slab_free+0x10/0x18
[ 135.897982] kfree+0x9c/0x2d0
[ 135.900965] __vb2_queue_free+0x774/0xb30
[ 135.904996] vb2_core_reqbufs+0x138/0x648
[ 135.909026] vb2_ioctl_reqbufs+0xe4/0x140
[ 135.913059] v4l_reqbufs+0x64/0x78
[ 135.916479] __video_do_ioctl+0x288/0x5a0
[ 135.920510] video_usercopy+0x1f0/0x640
[ 135.924367] video_ioctl2+0x14/0x1c
[ 135.927875] v4l2_ioctl+0x98/0xb0
[ 135.931209] ksys_ioctl+0x2fc/0x10c8
[ 135.934804] __arm64_sys_ioctl+0x40/0x58
[ 135.938748] do_el0_svc+0xb8/0x228
[ 135.942169] el0_sync_handler+0x14c/0x2a8
[ 135.946199] el0_sync+0x140/0x180
[ 135.949528]
[ 135.951030] The buggy address belongs to the object at ffff0000c9cfd000
[ 135.951030] which belongs to the cache kmalloc-2k of size 2048
[ 135.963604] The buggy address is located 2040 bytes to the right of
[ 135.963604] 2048-byte region [ffff0000c9cfd000, ffff0000c9cfd800)
[ 135.976091] The buggy address belongs to the page:
[ 135.980909] page:fffffe0003073e00 refcount:1 mapcount:0 mapping:ffff0000dc00ef80 index:0x0 compound_mapcount: 0
[ 135.991046] flags: 0x2fffc00000010200(slab|head)
[ 135.995691] raw: 2fffc00000010200 fffffe0003073600 0000000300000003 ffff0000dc00ef80
[ 136.003473] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
[ 136.011252] page dumped because: kasan: bad access detected
[ 136.016848]
[ 136.018349] Memory state around the buggy address:
[ 136.023164] ffff0000c9cfde80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 136.030423] ffff0000c9cfdf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 136.037681] >ffff0000c9cfdf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 136.044935] ^
[ 136.052104] ffff0000c9cfe000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 136.059362] ffff0000c9cfe080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 136.066617] ==================================================================
[ 136.073871] Disabling lock debugging due to kernel taint

Note: I had to disable the nouveau driver since that also has a KASAN bug in
v5.6-rc1.

This happens with the driver as-posted, so with a 3 buffer minimum.

Hopefully this will help you find the root cause of this issue.

Regards,

Hans
Thanks Hans. Probably dma address is not aligned properly. Will check.

[ÂÂ 41.222012] Mem abort info:
[ÂÂ 41.224807]ÂÂ ESR = 0x9600004f
[ÂÂ 41.227852]ÂÂ EC = 0x25: DABT (current EL), IL = 32 bits
[ÂÂ 41.233160]ÂÂ SET = 0, FnV = 0
[ÂÂ 41.236204]ÂÂ EA = 0, S1PTW = 0
[ÂÂ 41.239344] Data abort info:
[ÂÂ 41.242225]ÂÂ ISV = 0, ISS = 0x0000004f
[ÂÂ 41.246058]ÂÂ CM = 0, WnR = 1
[ÂÂ 41.249026] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081498000
[ÂÂ 41.255733] [ffff0000f5c3fff8] pgd=000000017f1f8003,
pud=000000017ec06003, pmd=000000017ea57003, pte=0060000175c3f793
[ÂÂ 41.266345] Internal error: Oops: 9600004f [#1] PREEMPT SMP
[ÂÂ 41.271905] Modules linked in: panel_simple tegra_drm
snd_hda_codec_hdmi snd_hda_tegra crct10dif_ce snd_hda_codec cec
drm_kms_helper snd_hda_core lp855x_bl drm pwm_tegra ip_tables x_tables
ipv6 nf_defrag_ipv6
[ÂÂ 41.290401] CPU: 3 PID: 532 Comm: v4l2-compliance Tainted: G
W 5.6.0-rc1-00035-g6a105c1c479a-dirty #1
[ÂÂ 41.300902] Hardware name: NVIDIA Jetson TX1 Developer Kit (DT)
[ÂÂ 41.306807] pstate: 60000005 (nZCv daif -PAN -UAO)
[ÂÂ 41.311593] pc : tegra_channel_buffer_prepare+0x34/0x88
[ÂÂ 41.316807] lr : __buf_prepare+0x1c4/0x230
[ÂÂ 41.320891] sp : ffff800011f5baa0
[ÂÂ 41.324195] x29: ffff800011f5baa0 x28: ffff0000f58cc100
[ÂÂ 41.329494] x27: ffff800011f5bc58 x26: ffff80001100b780
[ÂÂ 41.334792] x25: ffff0000f81f1608 x24: ffff0000f7be7c00
[ÂÂ 41.340091] x23: 00000000c058565d x22: 0000000000000000
[ÂÂ 41.345390] x21: ffff0000f81f16e8 x20: 0000000000000000
[ÂÂ 41.350688] x19: ffff0000f5c40000 x18: 0000000000000000
[ÂÂ 41.355986] x17: 0000000000000000 x16: 0000000000000000
[ÂÂ 41.361285] x15: ffff0000f8553800 x14: 0000000000000000
[ÂÂ 41.366583] x13: 003f480000000000 x12: 003f500000000000
[ÂÂ 41.371881] x11: 0000000100000000 x10: 0000000000000000
[ÂÂ 41.377180] x9 : 0000000000000000 x8 : ffff0000f5c40258
[ÂÂ 41.382478] x7 : 0000000000000030 x6 : 0000000000000001
[ÂÂ 41.387776] x5 : 0000000000000000 x4 : 00000000003f4800
[ÂÂ 41.393074] x3 : 00000000003f4800 x2 : 00000000003f4800
[ÂÂ 41.398373] x1 : ffff0000f81f1080 x0 : ffff0000f5c40000
[ÂÂ 41.403671] Call trace:
[ÂÂ 41.406109]Â tegra_channel_buffer_prepare+0x34/0x88
[ÂÂ 41.410974]Â __buf_prepare+0x1c4/0x230
[ÂÂ 41.414713]Â vb2_core_prepare_buf+0x94/0x110
[ÂÂ 41.418971]Â vb2_prepare_buf+0x74/0xa8
[ÂÂ 41.422710]Â vb2_ioctl_prepare_buf+0x54/0x60
[ÂÂ 41.426970]Â v4l_prepare_buf+0x44/0x58
[ÂÂ 41.430707]Â __video_do_ioctl+0x228/0x3e8
[ÂÂ 41.434705]Â video_usercopy+0x1cc/0x4d0
[ÂÂ 41.438531]Â video_ioctl2+0x14/0x20
[ÂÂ 41.442010]Â v4l2_ioctl+0x44/0x68
[ÂÂ 41.445316]Â v4l2_compat_ioctl32+0x21c/0x1420
[ÂÂ 41.449665]Â __arm64_compat_sys_ioctl+0xc8/0x108
[ÂÂ 41.454273]Â el0_svc_common.constprop.2+0xb0/0x168
[ÂÂ 41.459051]Â do_el0_svc_compat+0x18/0x38
[ÂÂ 41.462964]Â el0_sync_compat_handler+0x13c/0x194
[ÂÂ 41.467570]Â el0_sync_compat+0x144/0x180
[ÂÂ 41.471483] Code: b9407802 eb02007f 540001e8 b9007404 (f81f8001)
[ÂÂ 41.477563] ---[ end trace 051c84051f60870a ]---

With using minimum 3 buffers, this issue doesnt happen at all from
almost 72 hours of testing.


Will try with setting vb2 queue field min_buffers_needed as 3
instead
of adding check in queue setup.

+
+ return 0;
+}