ext2fs corruption under heavy load?

Peter Rival (frival@zk3.dec.com)
Tue, 01 Jun 1999 10:11:05 -0400


--------------618D947A2316CB50B0C2459D
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi,

I've been attempting to benchmark linux (2.2.9+AXP SMP patches) on a
2-CPU AS4100. Everything runs fine, and the numbers are quite good,
until somewhere in the (simulated) 50-60 user range. I'm running
AIM VII on a system with 26 disks attached (does this sound familiar
again? ;) so while IO isn't the problem, we are still beating on the
filesystem (fserver benchmark). Anyway, once I get into this user
range, I will start seeing errors in this code path:

if ((fd = creat(flist[CREAT][index],S_IRWXU | S_IRWXG | S_IRWXO))
< 0) { /* try create */
perror("creat() in dsearch()"); /*
handle error */
sprintf(errbuf,"dsearch():can't creat '%s'\n", /*
build error message */
flist[CREAT][index]);
chdir(cwd); /*
change directories */
cl_list(flist); /*
clear list *
return(-1); /*
return error */
} /* end
of error */
close(fd); /*
close the file */
if (unlink(flist[CREAT][index])) { /*
unlink it */
perror("unlink() in dsearch()"); /*
handle error */
getcwd(ncwd,256);
chdir(cwd); /*
change directories */
cl_list(flist); /*
clear list *
return(-1); /*
return error */
} /* end
of error */

It complains about not being able to unlink a file with an "unlink() in
dsearch(): No such file or directory". Well, that's true enough...the
file doesn't actually exist. The strange part is that in some of the
filesystems, I wind up with unattached inodes at the next fsck (one,
every time...). As I have said, this has happened on almost all of the
26 work disks at one time or another, and on all 4 of the SCSI
controllers (QLogic ISP 1020 (3) and 1040(1)). I can reproduce this
problem every time I run the benchmark.

I have looked through the sys_creat (or really, sys_open) sys_unlink
paths and don't see anything that should be wrong with it. Does anyone
have any ideas? The system is running a stock RH6.0 install with the
2.2.9 kernel plus the AXP SMP patches that Richard Henderson posted.

Thanks!

- Pete

--------------618D947A2316CB50B0C2459D
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
Hi,

    I've been attempting to benchmark linux (2.2.9+AXP SMP patches) on a 2-CPU AS4100.  Everything runs fine, and the numbers are quite good, until somewhere in the (simulated) 50-60 user range.  I'm running AIM VII on a system with 26 disks attached (does this sound familiar again? ;) so while IO isn't the problem, we are still beating on the filesystem (fserver benchmark).  Anyway, once I get into this user range, I will  start seeing errors in this code path:

      if ((fd = creat(flist[CREAT][index],S_IRWXU | S_IRWXG | S_IRWXO)) < 0) { /* try create */
        perror("creat() in dsearch()");                          /* handle error */
        sprintf(errbuf,"dsearch():can't creat '%s'\n",           /* build error message */
                flist[CREAT][index]);
        chdir(cwd);                                              /* change directories */
        cl_list(flist);                                          /* clear list *
        return(-1);                                              /* return error */
      }                                                          /* end of error */
      close(fd);                                                 /* close the file */
    if (unlink(flist[CREAT][index])) {                           /* unlink it */
        perror("unlink() in dsearch()");                         /* handle error */
        getcwd(ncwd,256);
        chdir(cwd);                                              /* change directories */
        cl_list(flist);                                          /* clear list *
        return(-1);                                              /* return error */
      }                                                          /* end of error */
 

It complains about not being able to unlink a file with an "unlink() in dsearch(): No such file or directory".  Well, that's true enough...the file doesn't actually exist.  The strange part is that in some of the filesystems, I wind up with unattached inodes at the next fsck (one, every time...).  As I have said, this has happened on almost all of the 26 work disks at one time or another, and on all 4 of the SCSI controllers (QLogic ISP 1020 (3) and 1040(1)).  I can reproduce this problem every time I run the benchmark.

    I have looked through the sys_creat (or really, sys_open) sys_unlink paths and don't see anything that should be wrong with it.  Does anyone have any ideas?  The system is running a stock RH6.0 install with the 2.2.9 kernel plus the AXP SMP patches that Richard Henderson posted.

Thanks!

 - Pete --------------618D947A2316CB50B0C2459D-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/