Segmentation fault on Linux 2.3.99 IA-64 platform

From: yiding_wang@agilent.com
Date: Wed Jul 05 2000 - 14:00:22 EST


I have been struggling this problem for about two weeks and suspect the
problem is from Linux 2.3.99 IA-64 kernel MMU.

Background:
We have an Fibre Channel HBA driver for Linux. The driver has been tested
on RH6.0, RH6.1 without any problem. With same driver code base and added
IA-64 support, the driver works on Linux 2.3.99 IA-64 released by
TurboLinux. However, when doing loop IO test for a few hours, there is
always a Segmentation fault generated and test stopped. The hardware we
are using is BigSur and Lions, different stepping.

>From driver site, I have checked all data structures, scsi command blocks,
DMA addresses and data lengths. Nothing is wrong. When test stopped, the
last few IO also completed without any error. For debugging and comparison,
we have run loop IO test with following command:
cp and diff - It ran a few hundreds loop and stopped after a couple hours
due to SIGSEGV signal;
cat and diff - It ran longer than using "cp" command but stooped same way;
dd (raw IO) - each loop took about 15 minutes and it ran about 6 hours,
eventually ended wit h same problem.

Also I have tested Qlogic driver Qla1280 which is primary driver for IA-64
platform. With same "cp and diff" loop test, same Segmentation fault is
generated in about 5 hours loop test (a few thousand loops).

One thing in common for all these different test:
The core dump file generated by Segmentation fault always has same faulty
address. With gdb -c core, following lines are in all core files:
Program terminated with Signal 11, Segmentation fault.
#0 0x20000000001185a0 in ?? ()
#1 0x20000000000df400 in ?? ()

With strace utility, all mmap, brk, open, close and other system calls
between a normal completed IO test and the one stopped by segmentation fault
are the same except processor ID from getpid(). Normal IO will complete but
the failed one will either stopped at read or write system call.

Since the Segmentation fault could happen either from user application or
kernel space (not likely from driver), I am thinking the possible kernel MMU
for IA-64 problem. This problem never happened on 32 bit Linux.

I understand that not many people are doing IA-64 Linux yet but hope someone
who is involved in kernel can shed some light!

Many thanks!

Eddie Wang
Agilent Technologies
350-370 West Trimble Road
MS 90TZ
San Jose, Ca 95131-1008
Phone: (408) 435-4213
Fax: (408) 435-5838
E-Mail: yiding_wang@agilent.com
   

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jul 07 2000 - 21:00:17 EST