stuck on TLB IPI wait

From: Vladimir Ivaschenko (hazard@francoudi.com)
Date: Tue Aug 28 2001 - 10:10:23 EST


Hi All,

Please CC any answers to my e-mail, as I'm not a subscriber to the list.

I run two nodes with identical RH-based 2.2.19 kernels with ext3 patches
in a cluster using "heartbeat". I also use InterMezzo module. The nodes
are Compaq DL360, 2x933 Mhz PIII, 2x36 GB HDDs on Compaq Smartarray. Node
#1 has 256MB RAM, node #2 has 512MB RAM. Distribution is RedHat 6.2.

/proc/cpuinfo:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 930.465
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr xmm
bogomips : 1854.66

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 930.465
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr xmm
bogomips : 1854.66

>From time to time, on node #2 (which is a stand-by node) I get the
following message:

Aug 15 19:55:01 mail2 kernel: stuck on TLB IPI wait (CPU#3)
Aug 15 19:55:01 mail2 last message repeated 5 times

Followed by messages from heartbeat the mail1 is down and is it taking
over the resouces. After that everything functions normally. Please note
that I'm running with "noapic", otherwise system often freezes when it
initializes the network cards - I can press the keys, but apparently all
IO is blocked as there is no response. These "freezes" happen with all
kernels I tried, both RedHat and clean kernels from linux.org.

Aug 15 19:55:03 mail2 heartbeat: info: Running /etc/ha.d/rc.d/shutdone shutdone
Aug 15 19:55:11 mail2 heartbeat[554]: WARN: node mail1.X.X: is dead
Aug 15 19:55:11 mail2 heartbeat: info: Running /etc/ha.d/rc.d/status status
Aug 15 19:55:11 mail2 heartbeat: info: Taking over resource group X.X.X.193
Aug 15 19:55:12 mail2 heartbeat: info: Acquiring resource group: mail1.X.X X.X.X.193 isinglass
Aug 15 19:55:12 mail2 heartbeat: info: Running /etc/ha.d/resource.d/IPaddr X.X.X.193 start
Aug 15 19:55:12 mail2 heartbeat: info: ifconfig eth0:0 X.X.X.193 netmask 255.255.255.0^Ibroadcast X.X.X.255
Aug 15 19:55:12 mail2 heartbeat: info: Sending Gratuitous Arp for X.X.X.193 on eth0:0 [eth0]
Aug 15 19:55:12 mail2 heartbeat: info: Running /etc/ha.d/resource.d/isinglass start
Aug 15 19:55:13 mail2 heartbeat: info: mach_down takeover complete.
Aug 15 19:55:13 mail2 heartbeat[554]: info: Link mail1.X.X:eth1 dead.
Aug 15 19:55:13 mail2 heartbeat: info: Running /etc/ha.d/rc.d/ifstat ifstat

Please let me know if any other information is needed.

-- 
Best Regards
Vladimir Ivaschenko

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Aug 31 2001 - 21:00:29 EST