Re: [PATCH] lib/sys_info: add a simple timer based memory corruption detector
From: Steven Rostedt
Date: Wed Jun 10 2026 - 13:04:03 EST
On Wed, 27 May 2026 11:43:24 +0800
Feng Tang <feng.tang@xxxxxxxxxxxxxxxxx> wrote:
> During debugging some bios/hardware related nasty memory corruption
> issues, we found using periodic timer to monitor specific dram/mmio
> physical address is very useful for debugging, which acts like
> a basic software watchpoint.
>
> For those bugs, who (and when) change(corrupt) those dram or mmio
> register is hard to trace, and sometimes even hardware jtag debugger
> can't help (say the physical address watchpoint doesn't work).
>
> The biggest shortcoming is it can never capture the exact point like
> a hardware watchpoint, no matter how small the timer interval is set,
> the idea is trying to approach the point, hoping the caught context
> have enough debug info (which did help us in solving bios/hardware
> bugs)
Instead of using a timer, can't you use the function tracer? That is,
have the value checked at *every* function call. You can easily add a
custom callback that gets called when every function is executed.
See https://docs.kernel.org/trace/ftrace-uses.html
>
> The working flow is simple: after suspected address is identified,
> start periodic timer polling it to catch if its value is changed to
> target 'magic' value, then halt the cpu (better limit to have only
> one cpu online), or panic, or print out system information, so that
> the error environment is frozen for further check , or let
> kexec/kdump to record the vmore, etc.
>
> All the settings are module parameters:
>
> watch_interval_ms: SW watchpoint check interval in ms
> paddr_dram_to_watch: Physical dram address to monitor.
> target_dram_val: Expected value at the dram address that triggers the watchpoint.
> paddr_mmio_to_watch: Physical mmio address to monitor. Must be 32-bit aligned.
> target_mmio_val: Expected value at the mmio address that triggers the watchpoint.
> panic_on_hit: Trigger kernel panic when watchpoint condition hits.
> hang_on_hit: halt the CPU (wait for HW debugger)
>
> This RFC is trying to show the idea and get feedback, and there are
> some todos:
> * merge the dram/mmio interface to auto detect it's dram or mmio
> * support runtime changing the address
> * move the starting point earlier in boot phase
> * currently is monitoring 'changing to a value', add support
> for 'changing from a value'
>
> Signed-off-by: Feng Tang <feng.tang@xxxxxxxxxxxxxxxxx>
> +static void sw_watchpoint_timer_fn(struct timer_list *unused)
> +{
> + bool hit = false;
> +
> + if (vaddr_mmio && (*vaddr_mmio == target_mmio_val)) {
> + pr_info("MMIO [@0x%lx] hit the target value [0x%x]!\n",
> + paddr_mmio_to_watch, target_mmio_val);
> + hit = true;
> + }
> +
> + if (vaddr_dram && (*vaddr_dram == target_dram_val)) {
> + pr_info("DRAM [@0x%lx] hit the target value [0x%lx]!\n",
> + paddr_dram_to_watch, target_dram_val);
> + hit = true;
> + }
> +
> + if (hit) {
> + sys_info(0);
> +
> + /* Useful for attaching HW debugger */
> + if (hang_on_hit) {
> + pr_warn("Will dead loop on this CPU\n");
> + while (1);
> + }
> +
> + /* Could be used to trigger kexec/kdump */
> + if (panic_on_hit)
> + panic("SW watchpoint hit!");
> +
> + if (check_once)
> + return;
> + }
The above function would be:
static bool no_check;
static void sw_watchpoint_fn(unsigned long ip, unsigned long pip,
struct ftrace_ops *op, struct ftrace_regs *fregs)
{
bool hit = false;
int bit;
if (no_check)
return;
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
[ do the above code ]
if (check_once) {
// possibly call irq_work to unregister ftrace
no_check = true;
}
ftrace_test_recursion_unlock(bit);
}
static struct ftrace_ops sw_watchpoint_fops = {
.func = sw_watchpoint_fn;
};
[..]
> +
> + mod_timer(&sw_watchpoint_timer, jiffies + msecs_to_jiffies(watch_interval_ms));
> +}
> +
> +static int __init sw_watchpoint_timer_init(void)
> +{
> + if (paddr_mmio_to_watch) {
> + vaddr_mmio = ioremap(paddr_mmio_to_watch & PAGE_MASK, PAGE_SIZE);
> + if (!vaddr_mmio)
> + return -ENOMEM;
> +
> + vaddr_mmio += (paddr_mmio_to_watch % PAGE_SIZE) / 4;
> + }
> +
> + if (paddr_dram_to_watch) {
> + vaddr_dram = phys_to_virt(paddr_dram_to_watch);
> + if (!vaddr_dram)
> + return -ENOMEM;
> + }
> +
> + timer_setup(&sw_watchpoint_timer, sw_watchpoint_timer_fn, 0);
> + sw_watchpoint_timer.expires = jiffies + msecs_to_jiffies(watch_interval_ms);
> + add_timer(&sw_watchpoint_timer);
Instead of the above, have:
register_ftrace_function(&sw_watchpoint_fops);
-- Steve
> +
> + return 0;
> +}
> +core_initcall(sw_watchpoint_timer_init);
> +#endif
>
> base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d