fyi, 2.0.36 read() EBADF bug

Clayton Weaver (cgweav@eskimo.com)
Sun, 12 Sep 1999 01:24:35 -0700 (PDT)


Anyone remember a read() bug on 2.0.x? It's there in both 2.0.29
and 2.0.36, identical behavior.

It goes like this:

about 150k in an mmap()ed buffer buf, PROT_READ, MAP_PRIVATE
open file descriptor fd, O_WRONLY, S_IWUSR (not the one mmap()ed)
(O_SYNC doesn't matter, happened with and without it)

skipping all of the error checking wrapper stuff that's in
the real code:

write(fd, buf, size); /* no problem, returned ssize_t size */
chmod(fdpath, S_IRUSR); /* no problem, fchmod() works ok too */
lseek(fd, 0, SEEK_SET); /* no problem */

fstat(fd, &struct_stat); /* this was just a diagnostic test after
the problem first turned up, again
no error, struct stat populated,
etc */

read(fd, some_other_buf, less_than_st_size_of_fd);

EBADF

The fd is still good right up until the read() call, when it
transitions into the kernel. "some_other_buf" has space
that's been in use through the 2gb of files the program
processed before getting to this point, ie it's still there and
the size for the read call matches.

What's more, the same sequence happens at the very beginning of the
program on a different file descriptor with no error (write(),
chmod(, S_IRUSR), lseek() to beginning, read()), called from the same
caller that calls read().

I got around it by replacing the lseek() with close()/open(), but
I just thought I'd mention it, since it was odd that it happened
in the exact same place on 2.0.29 and 2.0.36, ie something considered
stable and as far as anyone knows debugged.

No nfs or vfat or other wierdness, it's on the root partition, ext2fs
1.0.6 -O -m486, no other problems with that dir or filesystem.

Intuition suggest an error that isn't really a normal invalid
file descriptor (ie one the kernel has no record of for this
task), rather a problem that involved a file descriptor that there
was no precise errno value associated with, so somene used -EBADF
as the closest approximation. The close() works at the
same point in the code where the read() failed, which suggests
perhaps a different path through the buffer cache (odd place to
report EBADF from). Hence it's probably not a 2.2.x issue (buffer cache
code is too different), just something to remember for people who consider
2.0.3x as their stable fallback platform.

Regards,

Clayton Weaver
<mailto:cgweav@eskimo.com>
(Seattle)

"Everybody's ignorant, just in different subjects." Will Rogers

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/