Re:Re: Re: Re: [PATCH] media:v4l2-async:debugfs for registered subdevices

From: luo.liu.linux

Date: Mon Mar 16 2026 - 23:40:18 EST




Hi Sakari,

You are absolutely right. For an experienced kernel developer like yourself, tools like KASAN and CONFIG_DEBUG_LIST are second nature and incredibly effective for pinpointing such issues.
I truly admire your expertise in leveraging these advanced debugging mechanisms.

However, I think it is important to consider the reality for many junior driver developers (myself included). We often lack the deep intuition and extensive experience required to wield these
powerful tools effectively in every scenario. More often than not, we still rely on primitive methods: struggling to reproduce intermittent crashes, scattering printk logs everywhere, and manually
tracing execution paths. This process is extremely time-consuming and often yields no clear conclusions for "silent" resource leaks.

While I am actively working to improve my skills and learn to use these advanced tools more proficiently. I remain convinced that providing such a simple, intuitive interface offers a necessary
supplement by serving as a low-barrier entry point for developers.

I hope this perspective clarifies why I believe this small change can bring a bit of convenience to a broader range of driver developers.

Best regards,
Luo



At 2026-03-17 00:48:07, "Sakari Ailus" <sakari.ailus@xxxxxxxxxxxxxxx> wrote:
>Hi Luo,
>
>On Fri, Mar 13, 2026 at 09:50:56PM +0800, luo.liu.linux wrote:
>>
>> Hi Sakari,
>>
>> Apologies if my previous explanation wasn't clear enough.
>>
>> To clarify, the primary goal of this interface is not merely to verify if insmod/rmmod succeeds,
>> but to validate the correctness of the asynchronous subdevice registration and unregistration paths,
>> specifically ensuring that resource allocation and reclamation are handled properly.
>>
>> I would like to share a real-world scenario that motivated this patch:
>>
>> We had a camera subsystem pipeline like sensor -> dphy -> mipi-csi2 -> isp
>> subdevice driver that appeared to function perfectly for six months. insmod and rmmod completed without any errors,
>> and the system seemed stable during normal operation. However, just before a major release, a QA engineer performed
>> stress testing involving rapid, repeated cycles of insmod and rmmod, which eventually triggered a kernel crash.
>>
>> During the debugging process, I inspected the internal global lists:
>>
>> static LIST_HEAD(subdev_list);
>> static LIST_HEAD(notifier_list);
>>
>> By dumping the subdev_list via this debugfs interface, I discovered that a D-PHY subdevice entry remained in the list even
>> after its driver was unloaded. Crucially, the output explicitly showed the device name, allowing me to immediately pinpoint
>> the D-PHY driver as the culprit, rather than blindly troubleshooting other components in the pipeline (such as the sensor or ISP).
>>
>> This was the critical clue that led me to the root cause:
>>
>> The D-PHY subdriver's remove function was missing a call to v4l2_async_cleanup(sd). Consequently, the subdevice was never properly
>> unregistered from the async framework, leading to a use-after-free or stale pointer issue during the stress test.
>>
>> Without this debugfs interface, detecting such "silent" registration leaks is extremely difficult.
>> The driver loads and unloads without reporting errors, and standard logs (dmesg) often provide
>> no indication that an entry was left behind in the core framework's list until a crash occurs under specific timing conditions.
>>
>>
>> Given this experience, I believe this interface provides a vital visibility point for engineers to:
>>
>> 1,Verify that subdevices are correctly removed from the global list upon driver unload.
>> 2,Catch missing cleanup calls (like v4l2_async_cleanup) early in the development cycle, rather than discovering them through random crashes in stress testing.
>
>I guess you'd have found this with either KASAN or linked list debugging?
>
>--
>Sakari Ailus