Re: [PATCH v1] kernel: add a simple timer based software watchpoint

Next message: Mukesh Kumar Chaurasiya: "Re: [PATCH V16 4/7] rust/powerpc: Set min rustc version for powerpc"
Previous message: Greg KH: "Re: [PATCH] ext4: get rid of ppath in get_ext_path()"
In reply to: Feng Tang: "Re: [PATCH v1] kernel: add a simple timer based software watchpoint"
Next in thread: Thomas Gleixner: "Re: [PATCH v1] kernel: add a simple timer based software watchpoint"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Feng Tang

Date: Fri Jun 26 2026 - 02:50:28 EST

On Fri, Jun 26, 2026 at 09:56:22AM +0800, Feng Tang wrote:
> > No. This is just all catching the problem after the fact with no trace
> > and conclusive information about the root cause. The tools are there,
> > you just have to use them correctly. But sure creating magic hacks which
> > by chance give you the same information is way better...
>
> This issue was interesting. It showed up as a NULL pointer panic, and I
> found it's a global variable (in bss segment) being corrupted (which shouldn't
> happen logically). As it didn't happened on normal platforms, but one platform
> with special config, we think it could be silicon related, and sent it to
> silicon team, who did root cause it with gathering/analyzing silicon traces to
> be an array overflow issue, as the special config make that array much longer.
>
> My thought was if I used this method, I could have found the corruption
> happen right after the initialization of the module which has that array.

In RFC review, Steven suggested using ftrace hook way instead of timer.
If we inject the monitoring into the ftrace hook on function exit path,
it should be able to catch exactly which function corrupt this variable.

Thanks,
Feng