Fix for data corrupting bug (years late)

From: Jari Ruusu (jari.ruusu@pp.inet.fi)
Date: Tue Sep 12 2000 - 12:47:34 EST


Hi David, Alan and Linus,

In June 1998, I spent 6 hours debugging and fixing a nasty data
corrupting bug. In October 1999, I finally had time to write a small
program to demonstrate that bug. Since then, I have spent 11 months
trying to get it fixed in the mainstream kernels. It is still unfixed.

Mainstream linux kernels still have a bug involving syncronous writes
crossing the end of device limit (ENOSPC condition). THIS BUG HAS
EXISTED FOR YEARS, AND CAUSES DATA CORRUPTION. It is also easily
repeatable. Things go wrong if all requirements are met:

1) Application does syncronous writes directly to a removable device,
      e.g. /dev/fd0
2) A ENOSPC condition is reached at middle of write. No problems
      occour if write ends at end of medium, or if write begins at end
      of medium.
3) Removable device is changed
4) Application reads data at end of device (e.g. last kilobyte)

The problem is in file fs/block_dev.c in function block_write(). If a
process is doing syncronous writes it does this in a loop:

1) Get a buffer
2) Copy data to buffer
3) Stuff the buffer in bufferlist[] array
4) If bufferlist[] array is full, write and then release the buffers

When all writing is done normally, process checks the bufferlist[]
array. If there are any buffers, they are written out and released.
Cool, that works. However, if errors occour in the loop (ENOSPC or EIO
for example), process bails out of block_write() without releasing
buffers in the bufferlist[] array. THIS CAUSES DATA CORRUPTION FOR
REMOVABLE DEVICES.

Two patches are included: one for 2.0.39pre8 and one for 2.2.15 to
2.2.18pre6. The 2.2.15 patch also works for 2.4.0-test8 with some fuzz.

Below is source code for a small test program (testbug.c) that
demonstrates the problem. You will need two formatted 1.44 MB floppies
to run the program. The program does following:

1) Prompt to insert floppy #1
2) Syncronously write 0x11 to last kilobyte of floppy #1
3) Prompt to insert floppy #2
4) Syncronously write 0xEE to last kilobyte of floppy #2
5) Prompt to insert floppy #1
6) Read last kilobyte of floppy #1
7) Compare the results, buggy kernels return 0xEE

Running ./testbug on a buggy kernel will output this:

> Insert a formatted floppy #1 to /dev/fd0
> All data on the floppy will be lost!
> Press ENTER to continue, CTRL-d to abort:
> Writing to floppy #1 ... done
>
> Insert a formatted floppy #2 to /dev/fd0
> All data on the floppy will be lost!
> Press ENTER to continue, CTRL-d to abort:
> Writing to floppy #2 ... done
>
> Insert floppy #1 to /dev/fd0
> Press ENTER to continue, CTRL-d to abort:
> Reading from floppy #1 ... done
>
> Your kernel has O_SYNC/ENOSPC bug!

Running ./testbug on patched kernel will output this:

> Insert a formatted floppy #1 to /dev/fd0
> All data on the floppy will be lost!
> Press ENTER to continue, CTRL-d to abort:
> Writing to floppy #1 ... done
>
> Insert a formatted floppy #2 to /dev/fd0
> All data on the floppy will be lost!
> Press ENTER to continue, CTRL-d to abort:
> Writing to floppy #2 ... done
>
> Insert floppy #1 to /dev/fd0
> Press ENTER to continue, CTRL-d to abort:
> Reading from floppy #1 ... done
>
> Your kernel works fine.

Regards,
Jari Ruusu <jari.ruusu@pp.inet.fi>

-------------------- CUT HERE --------------------

/* testbug.c */

/*
 * This program tests if your kernel has a O_SYNC/ENOSPC bug.
 * You will need two formatted 1440 KB floppies. Some data on the floppies
 * will be overwritten, so do a mkfs on them afterwards.
 */

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>

#define FDDEV "/dev/fd0" /* floppy device */
#define FDSIZE 1440 /* floppy size in KB */

char buf[2048];

void fdwork(mode, code)
int mode;
int code;
{
    int fd;

    fflush(stdout);
    if((fd = open(FDDEV, mode ? O_WRONLY | O_SYNC : O_RDONLY, 0700)) == -1) {
        printf("open() failed. Aborted.\n");
        exit(1);
    }
    /* Seek to 1 KB less than floppy size */
    if(lseek(fd, (FDSIZE - 1) * 1024, SEEK_SET) == (off_t)-1) {
        printf("lseek() failed. Aborted.\n");
        exit(1);
    }
    memset(buf, code, 2048);
    if(mode) {
        /* Try to write 2 KB. Only 1 KB gets written (ENOSPC) */
        if(write(fd, buf, 2048) != 1024) {
            printf("write() failed. Aborted.\n");
            exit(1);
        }
    } else {
        /* Read last KB. Data should come from floppy */
        /* Buggy kernels return data from screwed buffers */
        if(read(fd, buf, 1024) != 1024) {
            printf("read() failed. Aborted.\n");
            exit(1);
        }
    }
    if(close(fd) == -1) {
        printf("close() failed. Aborted.\n");
        exit(1);
    }
}

void waitkey(void)
{
    char tmpbuf[80];

    printf("Press ENTER to continue, CTRL-d to abort: ");
    fflush(stdout);
    if(read(0, tmpbuf, sizeof(tmpbuf)) < 1) {
        printf("aborted\n");
        exit(2);
    }
}

void main(argc, argv)
int argc;
char **argv;
{
    char comp[1024];

    printf("\nInsert a formatted floppy #1 to %s\n", FDDEV);
    printf("All data on the floppy will be lost!\n");
    waitkey();
    printf("Writing to floppy #1 ... ");
    fdwork(1, 0x11);

    printf("done\n\nInsert a formatted floppy #2 to %s\n", FDDEV);
    printf("All data on the floppy will be lost!\n");
    waitkey();
    printf("Writing to floppy #2 ... ");
    fdwork(1, 0xEE);

    printf("done\n\nInsert floppy #1 to %s\n", FDDEV);
    waitkey();
    printf("Reading from floppy #1 ... ");
    fdwork(0, 0x55);
    printf("done\n\n");

    memset(comp, 0x11, 1024);
    if(memcmp(buf, comp, 1024)) {
        printf("Your kernel has O_SYNC/ENOSPC bug!\n\n");
        exit(3);
    }
    printf("Your kernel works fine.\n\n");
    exit(0);
}

-------------------- CUT HERE --------------------

--- linux-2.0.39pre8/fs/block_dev.c Sun Nov 15 20:33:11 1998
+++ linux/fs/block_dev.c Tue Sep 12 00:42:33 2000
@@ -27,7 +27,7 @@
         int block, blocks;
         loff_t offset;
         int chars;
- int written = 0;
+ int written = 0, retval = 0;
         struct buffer_head * bhlist[NBUF];
         unsigned int size;
         kdev_t dev;
@@ -57,8 +57,10 @@
         else
                 size = INT_MAX;
         while (count>0) {
- if (block >= size)
- return written ? written : -ENOSPC;
+ if (block >= size) {
+ retval = -ENOSPC;
+ goto cleanup;
+ }
                 chars = blocksize - offset;
                 if (chars > count)
                         chars=count;
@@ -90,7 +92,8 @@
                       bhlist[i] = getblk (dev, block+i, blocksize);
                       if(!bhlist[i]){
                         while(i >= 0) brelse(bhlist[i--]);
- return written ? written : -EIO;
+ retval= -EIO;
+ goto cleanup;
                       };
                     };
                     ll_rw_block(READ, blocks, bhlist);
@@ -101,8 +104,10 @@
                 };
 #endif
                 block++;
- if (!bh)
- return written ? written : -EIO;
+ if (!bh) {
+ retval = -EIO;
+ goto cleanup;
+ }
                 p = offset + bh->b_data;
                 offset = 0;
                 filp->f_pos += chars;
@@ -130,6 +135,7 @@
                 if(write_error)
                         break;
         }
+ cleanup:
         if ( buffercount ){
                 ll_rw_block(WRITE, buffercount, bufferlist);
                 for(i=0; i<buffercount; i++){
@@ -139,10 +145,11 @@
                         brelse(bufferlist[i]);
                 }
         }
- filp->f_reada = 1;
+ if(!retval)
+ filp->f_reada = 1;
         if(write_error)
                 return -EIO;
- return written;
+ return written ? written : retval;
 }
 
 int block_read(struct inode * inode, struct file * filp,

-------------------- CUT HERE --------------------

--- linux-2.2.15/fs/block_dev.c Thu May 4 19:46:41 2000
+++ linux/fs/block_dev.c Thu May 4 22:07:46 2000
@@ -24,7 +24,7 @@
         ssize_t block, blocks;
         loff_t offset;
         ssize_t chars;
- ssize_t written = 0;
+ ssize_t written = 0, retval = 0;
         struct buffer_head * bhlist[NBUF];
         size_t size;
         kdev_t dev;
@@ -54,8 +54,10 @@
         else
                 size = INT_MAX;
         while (count>0) {
- if (block >= size)
- return written ? written : -ENOSPC;
+ if (block >= size) {
+ retval = -ENOSPC;
+ goto cleanup;
+ }
                 chars = blocksize - offset;
                 if (chars > count)
                         chars=count;
@@ -67,15 +69,19 @@
                         if (chars != blocksize)
                                 fn = bread;
                         bh = fn(dev, block, blocksize);
- if (!bh)
- return written ? written : -EIO;
+ if (!bh) {
+ retval = -EIO;
+ goto cleanup;
+ }
                         if (!buffer_uptodate(bh))
                                 wait_on_buffer(bh);
                 }
 #else
                 bh = getblk(dev, block, blocksize);
- if (!bh)
- return written ? written : -EIO;
+ if (!bh) {
+ retval = -EIO;
+ goto cleanup;
+ }
 
                 if (!buffer_uptodate(bh))
                 {
@@ -99,7 +105,8 @@
                         if (!bhlist[i])
                         {
                           while(i >= 0) brelse(bhlist[i--]);
- return written ? written : -EIO;
+ retval = -EIO;
+ goto cleanup;
                         }
                       }
                     }
@@ -108,7 +115,8 @@
                     wait_on_buffer(bh);
                     if (!buffer_uptodate(bh)) {
                           brelse(bh);
- return written ? written : -EIO;
+ retval = -EIO;
+ goto cleanup;
                     }
                   };
                 };
@@ -141,6 +149,7 @@
                 if(write_error)
                         break;
         }
+ cleanup:
         if ( buffercount ){
                 ll_rw_block(WRITE, buffercount, bufferlist);
                 for(i=0; i<buffercount; i++){
@@ -150,10 +159,11 @@
                         brelse(bufferlist[i]);
                 }
         }
- filp->f_reada = 1;
+ if(!retval)
+ filp->f_reada = 1;
         if(write_error)
                 return -EIO;
- return written;
+ return written ? written : retval;
 }
 
 ssize_t block_read(struct file * filp, char * buf, size_t count, loff_t *ppos)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Sep 15 2000 - 21:00:18 EST