Re: [PATCH v1] kernel: add a simple timer based software watchpoint
From: Feng Tang
Date: Mon Jun 22 2026 - 08:46:10 EST
Hi David,
On Mon, Jun 22, 2026 at 10:42:06AM +0200, David Hildenbrand (Arm) wrote:
> On 6/22/26 10:14, Feng Tang wrote:
> > During debugging some bios/hardware related nasty memory corruption
> > issues, we found using periodic timer to monitor specific dram/mmio
> > physical address is very useful for debugging, which acts like
> > a basic software watchpoint.
> >
> > For those bugs, who (and when) change(corrupt) those dram or mmio
> > register is hard to trace, and sometimes even hardware jtag debugger
> > can't help (say the physical address watchpoint doesn't work).
> >
> > The biggest shortcoming of this method is it can never capture the
> > exact point like a hardware watchpoint. The idea is trying to
> > approach the point by adjusting the timer interval, hoping the caught
> > context have enough debug info (which did help us in solving quite
> > some bios/hardware bugs)
> >
> > The working flow is simple: after suspected address is identified,
> > start periodic timer polling it to catch if its value is changed to
> > target 'magic' value, then halt the cpu (better limit to have only
> > one cpu online), or panic, or print out system information, so that
> > the error environment is frozen for further check , or let
> > kexec/kdump to record the vmcore, etc.
> >
> > One real use case was:
> > "
> > On an arm64 platform, some BIOS/HW config caused OS boot easy to
> > stall in systemd init phase, then the reproducing was simplified it
> > by making it boot to console with a function-reduced rootfs, and
> > always triggering 'segmentation fault' when running 'less' command.
> >
> > By using GDB, some static array of 'less' is found corrupted before
> > being written, and one byte in its memory is always '0x33'.At this
> > stage the static array is in bss segment first, and backed by kernel
> > zero page after first read, so it was an obvious memory corruption.
> >
> > HW engineers tried to capture HW traces after the issue happened, but
> > could not find valuable hints, as the corruption could happen long
> > before 'less' is run, and the trace/context of that time was gone.
> >
> > As physical address of kernel zero page was known, and offset of the
> > corrupted byte was fixed, the address was A. But HW debugger failed
> > to breakpoint the point that address A was written with '0x33'. Then
> > this method was used to monitor 'writing 0x33 to A' with 30ms
> > interval, and halted the system by 'while (1);' (the system was made
> > to a UP by using 'nr_cpus=1' cmdline parameter) once hit, then HW
> > people collected the HW trace they need and root caused it to be a
> > bad config.
> > "
> >
> > The culprits of memory corruption issues we met are mainly:
> > * broken devices (like ethernet card)
> > * BIOS runtime service
> > * silicon bugs
> > * kernel itself
> >
> > As kernel already have many useful debug featues like slub_debug,
> > kasan, kfence, kmemleak etc.., this method could be more fit for the
> > upper three types.
> >
> > All the settings are module parameters:
> >
> > watch_interval_ms: SW watchpoint check interval in ms
> > paddr_dram_to_watch: Physical dram address to monitor.
> > target_dram_val: Expected value at the dram address that triggers the watchpoint.
> > paddr_mmio_to_watch: Physical mmio address to monitor. Must be 32-bit aligned.
> > target_mmio_val: Expected value at the mmio address that triggers the watchpoint.
> > panic_on_hit: Trigger kernel panic when watchpoint condition hits.
> > hang_on_hit: halt the CPU (wait for HW debugger)
> >
> > This provides the basic function, and there are more todoes:
> > * add a ftrace mode to do function level monitoring with ftrace hook,
> > which is more accurate timing wise, as suggested by Steven Rostedt
> > * merge the dram/mmio interface to auto detect it's dram or mmio
> > * support runtime changing the address
> > * move the starting point earlier in boot phase
> > * monitor a whole memory region
> > * currently is monitoring 'changing to a value', add support
> > for 'changing from a value'
>
> That really looks more like the kind of thing you would want to carry as a OOT
> hack for your special debugging needs :)
Thanks for the review and bringing up a very good question that whether
the patch is worthy!
Personally, I think it is.
Firstly, IMHO, memory corruption is a common issue. Petr asked me to give
some real-world examples, and I talked about two cases in
[1]. https://lore.kernel.org/lkml/aiaRLiaNs9M9c2q-@U-2FWC9VHC-2323.local/
I have met more memory corruptions, and I would divide them to 2 types:
kernel-triggered and non-kernel (device/BIOS/silicon) triggered.
For non-kernel triggered: mainly are from devices (ethernet cards, USB
controllers etc), many devices may corrupt kernel memory during dma
transfer, suspend/resume cycle or warm reboot, like some version of
Aquantia NIC , mainstream MLXCX5 NIC, or octeon-hcd, which are general
hardware devices:
[2]. https://bugzilla.kernel.org/show_bug.cgi?id=217854
[3]. https://lore.kernel.org/lkml/20221130085451.3390992-2-feng.tang@xxxxxxxxx/
For kernel triggered:
* kernel array overflows is common case for memory corruption (which
should be easy captured by watchpoint backed by ftrace method)
* kernel zero page case, which I don't know the detail but I guess it comes
from some corruption to zero page similar like what I met.
[4]. https://lore.kernel.org/all/20260526175846.2694125-31-ardb+git@xxxxxxxxxx/
As my own experience is very limited, and I think there should be much
more real-world memory corruptions cases than I know.
And IMHO, these non-kernel trigged ones are more difficult to handle
than kernel triggered once, as the symptom is more random, and hard
to generalize a pattern.
Secondly, it is low cost and could be built into production kernel.
* It is under a "default n" config, so won't bother normal users/OSVs
* The binary is small when built into kernel, and won't cost kernel
cycles when not enabled
* For production kernel running on big server fleets, the kasan/slub_debug
option usually won't be enabled as affecting performance, so many
kernel-triggered memory corruption can't be helped by them.
Thanks,
Feng
>
> --
> Cheers,
>
> David