Re: dmaengine: pl330 rare NULL pointer dereference in pl330_tasklet
From: Marek Szyprowski
Date: Mon Nov 02 2020 - 02:38:38 EST
Hi Krzysztof,
On 31.10.2020 20:01, Krzysztof Kozlowski wrote:
> I hit quite rare issue with pl330 DMA driver, difficult to reproduce
> (actually failed to do so):
>
> Happened during early reboot
>
> [ OK ] Stopped target Graphical Interface.
> [ OK ] Stopped target Multi-User System.
> [ OK ] Stopped target RPC Port Mapper.
> Stopping OpenSSH Daemonti[ 75.447904] 8<--- cut here ---
> [ 75.449506] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
> ...
> [ 75.690850] [<c0902f70>] (pl330_tasklet) from [<c034d460>] (tasklet_action_common+0x88/0x1f4)
> [ 75.699340] [<c034d460>] (tasklet_action_common) from [<c03013f8>] (__do_softirq+0x108/0x428)
> [ 75.707850] [<c03013f8>] (__do_softirq) from [<c034dadc>] (run_ksoftirqd+0x2c/0x4c)
> [ 75.715486] [<c034dadc>] (run_ksoftirqd) from [<c036fbfc>] (smpboot_thread_fn+0x13c/0x24c)
> [ 75.723693] [<c036fbfc>] (smpboot_thread_fn) from [<c036c18c>] (kthread+0x13c/0x16c)
> [ 75.731390] [<c036c18c>] (kthread) from [<c03001a8>] (ret_from_fork+0x14/0x2c)
>
> Full log:
> https://protect2.fireeye.com/v1/url?k=7445a1ab-2bde98a7-74442ae4-000babff3563-a368d542db0c5500&q=1&e=62e4887b-e224-48e5-80a2-71163caeeec8&u=https%3A%2F%2Fkrzk.eu%2F%23%2Fbuilders%2F20%2Fbuilds%2F954%2Fsteps%2F22%2Flogs%2Fserial0
>
> 1. Arch ARM Linux
> 2. multi_v7_defconfig
> 3. Odroid HC1, ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC
> 4. systemd, boot up with static IP set in kernel command line
> 5. No swap
> 6. Kernel, DTB and initramfs are downloaded with TFTP
> 7. NFS root (NFS client) mounted from a NFSv4 server
>
> Since I was not able to reproduce it, obviously I did not run bisect. If
> anyone has ideas, please share.
Well, I've also observed it a few times. IMHO it is related to the
broken UART (in DMA mode) shutdown procedure. Usually it can be easily
observed by flushing some random parts of the previously transmitted
data to the UART console during the system shutdown. This also depends
on the board and used system (especially the presence of systemd, which
plays with UART differently than the old sysv init). IMHO there is a
kind of use-after-free issue there, so the above pl330 stacktrace can be
also observed depending on the timing and system load. This issue is
there from the beginning of the DMA support. I have it on my todo list,
but it had too low priority to take a look into it. I only briefly
checked the related code a few years ago and noticed that the UART
shutdown is not really synchronized with DMA. However that time I didn't
find any simple fix, so I gave up.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland