Re: [PATCH] drm/atmel-hlcdc: Release CRTC commit when destroying plane state
From: Ludovic.Desroches
Date: Thu Mar 21 2024 - 02:46:13 EST
On 3/15/24 13:25, Pierre-Louis Dourneau wrote:
> [You don't often get email from pl.dourneau@xxxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On 3/12/24, Pierre-Louis Dourneau <pl.dourneau@xxxxxxxxxx> wrote:
>> On 3/8/24, Ludovic.Desroches@xxxxxxxxxxxxx <Ludovic.Desroches@xxxxxxxxxxxxx> wrote:
>>> This patch fixes the memory leak but introduces a crash on my side when
>>> exiting a graphics app using the Microchip graphics library.
>>
>> We've tried to reproduce your crash with 6.1.22-linux4microchip-2023.04,
>> to no avail. We'll try to upgrade to 6.1.55-linux4microchip-2023.10 (your
>> version) and test again.
>
> I was able to test a few more recent kernel versions[0] with the patch
> applied. None yielded any crash, be it running Microchip's EGT samples[1]
> or libdrm's modetest. Although, what I did manage to reproduce was a
> refcount underflow similar to the one you had:
>
> # modetest -M atmel-hlcdc -s 32:#0 -P 33@47:800x400@XR24 -a
> setting mode 1024x600-65.48Hz on connectors 32, crtc 47
> testing 800x400@XR24 on plane 33, crtc 47
> [ 75.736699] ------------[ cut here ]------------
> [ 75.741353] WARNING: CPU: 0 PID: 200 at lib/refcount.c:28 refcount_warn_saturate+0xf0/0x15c
> [ 75.749731] refcount_t: underflow; use-after-free.
> [ 75.754574] CPU: 0 PID: 200 Comm: modetest Not tainted 6.1.55-linux4microchip-2023.10 #4
> [ 75.762915] Hardware name: Microchip SAM9X60
> [ 75.767198] unwind_backtrace from show_stack+0x10/0x18
> [ 75.772423] show_stack from dump_stack_lvl+0x28/0x34
> [ 75.777479] dump_stack_lvl from __warn+0x8c/0xc0
> [ 75.782187] __warn from warn_slowpath_fmt+0x74/0xa8
> [ 75.787158] warn_slowpath_fmt from refcount_warn_saturate+0xf0/0x15c
> [ 75.793611] refcount_warn_saturate from __drm_atomic_helper_plane_destroy_state+0xd0/0xd4
> [ 75.801894] __drm_atomic_helper_plane_destroy_state from atmel_hlcdc_plane_atomic_destroy_state+0x38/0x48
> [ 75.811573] atmel_hlcdc_plane_atomic_destroy_state from drm_atomic_state_default_clear+0x1c4/0x2fc
> [ 75.820642] drm_atomic_state_default_clear from __drm_atomic_state_free+0x7c/0xb0
> [ 75.828228] __drm_atomic_state_free from drm_mode_atomic_ioctl+0x868/0xb88
> [ 75.835204] drm_mode_atomic_ioctl from drm_ioctl+0x200/0x3c4
> [ 75.840960] drm_ioctl from sys_ioctl+0x240/0xb48
> [ 75.845669] sys_ioctl from ret_fast_syscall+0x0/0x44
> [ 75.850725] Exception stack(0xc8c91fa8 to 0xc8c91ff0)
> [ 75.855794] 1fa0: 004b0ec0 00000003 00000003 c03864bc becd7bc8 becd7b98
> [ 75.863992] 1fc0: 004b0ec0 00000003 becd7bc8 00000036 00000003 00000003 b6f22f80 00000400
> [ 75.872183] 1fe0: b6f21e74 becd7a68 b6f07494 b6f61cc8
> [ 75.877289] ---[ end trace 0000000000000000 ]---
> Atomic Commit failed [1]
>
> Same but without using the atomic uAPI (`-a` option removed):
>
> # modetest -M atmel-hlcdc -s 32:#0 -P 33@47:800x400@XR24
> setting mode 1024x600-65.48Hz on connectors 32, crtc 47
> testing 800x400@XR24 overlay plane 33
> failed to enable plane: Invalid argument
>
> [ 98.542958] ------------[ cut here ]------------
> [ 98.547547] WARNING: CPU: 0 PID: 28 at lib/refcount.c:28 refcount_warn_saturate+0xf0/0x15c
> [ 98.555902] refcount_t: underflow; use-after-free.
> [ 98.560698] CPU: 0 PID: 28 Comm: kworker/0:7 Tainted: G W 6.1.55-linux4microchip-2023.10 #6
> [ 98.570695] Hardware name: Microchip SAM9X60
> [ 98.574972] Workqueue: events drm_mode_rmfb_work_fn
> [ 98.579859] unwind_backtrace from show_stack+0x10/0x18
> [ 98.587615] show_stack from dump_stack_lvl+0x28/0x34
> [ 98.595201] dump_stack_lvl from __warn+0x8c/0xc0
> [ 98.602438] __warn from warn_slowpath_fmt+0x74/0xa8
> [ 98.609937] warn_slowpath_fmt from refcount_warn_saturate+0xf0/0x15c
> [ 98.618919] refcount_warn_saturate from __drm_atomic_helper_plane_destroy_state+0xd0/0xd4
> [ 98.629740] __drm_atomic_helper_plane_destroy_state from atmel_hlcdc_plane_atomic_destroy_state+0x38/0x48
> [ 98.641947] atmel_hlcdc_plane_atomic_destroy_state from drm_atomic_state_default_clear+0x1c4/0x2fc
> [ 98.653545] drm_atomic_state_default_clear from __drm_atomic_state_free+0x7c/0xb0
> [ 98.663660] __drm_atomic_state_free from drm_framebuffer_remove+0x48c/0x540
> [ 98.673252] drm_framebuffer_remove from drm_mode_rmfb_work_fn+0x68/0x84
> [ 98.682495] drm_mode_rmfb_work_fn from process_one_work+0x1b4/0x3f4
> [ 98.691390] process_one_work from worker_thread+0x214/0x4e8
> [ 98.699587] worker_thread from kthread+0xb4/0xd8
> [ 98.706824] kthread from ret_from_fork+0x14/0x28
> [ 98.714060] Exception stack(0xc88adfb0 to 0xc88adff8)
> [ 98.719125] dfa0: 00000000 00000000 00000000 00000000
> [ 98.727327] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 98.735520] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [ 98.742219] ---[ end trace 0000000000000000 ]---
>
> The first one is not deterministic, you have to try a few times to trigger
> it. The second one is a hit every time.
>
> Same commands on a kernel without the patch don't report any underflow.
> Note the commit in the first command also fails on a kernel without the
> patch, which I guess is expected as plane 33 is the primary plane and I'm
> trying set dimensions that do not match the size of the display. The commit
> succeeds when invoking with the correct dimensions, but then I can't make
> it produce an underflow. Same with the second command.
>
> It seems to only trigger once per boot. Running the commands again does not
> yield another underflow.
>
> Looking at the disassembly, this is an underflow of the drm_crtc_commit's
> refcount this time. In the warning you had, it was on a framebuffer object.
>
> Anyway, I'll go back to the drawing board, study more closely the resource
> release part of the driver. Thanks for having brought up the issues with
> the patch.
Hello,
Sorry for the late reply, I was off previous weeks. Late me know if you
need more information or testing on my side.
From the top of my head, the crash was systematic on my side and I
think I get it with any EGT application.
I use an EGT app to check if the memory leak is still here. It
continuously change the text of a button causing many calls of
drm_atomic_helper_setup_commit. After a while, we can see that the
virtual memory is constant while the system memory decreases. At least,
your patch sounds to fix this memory leak.
Regards,
Ludovic
>
> [0]: Namely, linux4microchip-2023.10 (6.1.55), v6.8, and drm-next-2023-03-13
> [1]: https://github.com/linux4sam/egt/tree/master/examples
>
> Pierre-Louis