Re: [PATCH] Check for breakpoint in text_poke to eliminate bug_on

From: Mathieu Desnoyers
Date: Sat Apr 19 2008 - 20:05:44 EST


* Pekka Paalanen (pq@xxxxxx) wrote:
> On Sat, 19 Apr 2008 17:58:26 -0400
> Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx> wrote:
>
> > * Pekka Paalanen (pq@xxxxxx) wrote:
> > >
> > > A simple
> > > echo 0 > /sys/devices/system/cpu/cpu1/online
> > > echo 1 > /sys/devices/system/cpu/cpu1/online
> > >
> > > produces the following kernel log (netconsole) and then after a couple
> > > second hang the machine reboots:
> > >
> > > [ 84.678357] console [netcon0] enabled
> > > [ 84.679568] netconsole: network logging started
> > > [ 232.812335] CPU 1 is now offline
> > > [ 232.812678] lockdep: fixing up alternatives.
> > > [ 232.813051] SMP alternatives: switching to UP code
> > > [ 268.447582] lockdep: fixing up alternatives.
> > > [ 268.447903] SMP alternatives: switching to SMP code
> > > [ 268.459462] Booting processor 1/1 ip 6000
> > >
> > > My kernel is sched-devel/latest git tree with Desnoyers' patch, and my
> > > patches that touch only arch/x86/mm/mmio-mod.c.
> > > The machine is Thinkpad T61 with:
> > >
> > > processor : 0
> > > vendor_id : GenuineIntel
> > > cpu family : 6
> > > model : 15
> > > model name : Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz
> > > stepping : 10
> > > cpu MHz : 2001.000
> > > cache size : 4096 KB
> > > physical id : 0
> > > siblings : 2
> > > core id : 0
> > > cpu cores : 2
> > > apicid : 0
> > > initial apicid : 0
> > > fpu : yes
> > > fpu_exception : yes
> > > cpuid level : 10
> > > wp : yes
> > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> > > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> > > lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est
> > > tm2 ssse3 cx16 xtpr lahf_lm ida
> > > bogomips : 3997.03
> > > clflush size : 64
> > > cache_alignment : 64
> > > address sizes : 36 bits physical, 48 bits virtual
> > > power management:
> > >
> > > processor : 1
> > > vendor_id : GenuineIntel
> > > cpu family : 6
> > > model : 15
> > > model name : Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz
> > > stepping : 10
> > > cpu MHz : 2001.000
> > > cache size : 4096 KB
> > > physical id : 0
> > > siblings : 2
> > > core id : 1
> > > cpu cores : 2
> > > apicid : 1
> > > initial apicid : 1
> > > fpu : yes
> > > fpu_exception : yes
> > > cpuid level : 10
> > > wp : yes
> > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> > > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> > > lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est
> > > tm2 ssse3 cx16 xtpr lahf_lm ida
> > > bogomips : 3991.26
> > > clflush size : 64
> > > cache_alignment : 64
> > > address sizes : 36 bits physical, 48 bits virtual
> > > power management:
> > >
> > > Any help would be appreciated.
> > >
> > >
> > > Thanks.
> >
> > This patch should bring more consistency checks to text_poke, can you
> > give it a try ?
> >
> > Hm, actually, I think it contains the fix you are looking for.
> >
> > kernel_text_address -> core_kernel_text will probably make everything go
> > smoothly.
> >
> > Mathieu
> >
> >
> > Check for breakpoint in text_poke to eliminate bug_on
> >
> > It's ok to modify an instruction non-atomically (multiple memory accesses to a
> > large and/or non aligned instruction) *if and only if* we have inserted a
> > breakpoint at the beginning of the instruction.
> >
> > Also change kernel_text_address (bogus) check to core_kernel_text.
> >
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx>
> > ---
> > arch/x86/kernel/alternative.c | 49 ++++++++++++++++++++++++------------------
> > 1 file changed, 29 insertions(+), 20 deletions(-)
>
> Sorry, no change.
>
> [ 93.315242] netconsole: network logging started
> [ 95.797496] eth0: no IPv6 routers present
> [ 127.472213] CPU 1 is now offline
> [ 127.472547] lockdep: fixing up alternatives.
> [ 127.472923] SMP alternatives: switching to UP code
> [ 134.709384] lockdep: fixing up alternatives.
> [ 134.709701] SMP alternatives: switching to SMP code
> [ 134.721344] Booting processor 1/1 ip 6000
>
> few seconds pause and it reboots. A working patch in six minutes would
> have been quite awesome :-)
>

Hrm, I'll have to do some more tests.

Can you mail you .config please ?

Mathieu

>
> Thanks.
>
> --
> Pekka Paalanen
> http://www.iki.fi/pq/

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/