Re: The INN/mmap bug

From: Alexander Viro (viro@math.psu.edu)
Date: Mon Sep 18 2000 - 12:58:04 EST

Next message: Chris Mason: "Re: The INN/mmap bug"
Previous message: J.A. Magallon: "init/main.c do-while bug still there ?"
In reply to: Linus Torvalds: "Re: The INN/mmap bug"
Next in thread: Linus Torvalds: "Re: The INN/mmap bug"
Reply: Linus Torvalds: "Re: The INN/mmap bug"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

OK, let's see. I've tried to describe what we have now (marking the
bugs) + proposal that would give somewhat saner logics (in the end of
posting). Comments are more than welcome.

        * life of the page is clearly divided in two parts - before it can
be mapped to a user context and after that.
        * no read requests can be issued in the second stage, by that
time page should be permanently up-to-date [ClearPageUptodate() in
generic_file_write() is a bug]
        * buffer ring can be dropped only under the page lock.
        * on the first stage buffer ring can be dropped only if the page
doesn't contain data that would be more recent than data in fs.
        * on the second stage buffer ring can be dropped if there is no IO
scheduled or in progress [note: buffer_tied() might be handy, meaning
"dirty or under IO"; i.e. when ll_rw_block() does or will need this bh]
        * we have several bh state components and the thing is a big,
fscking mess. If we look at the areas outside of the page lock we have:

1st stage, !uptodate, !mapped contents is either the same as on disk
                                or it's a junk.
1st stage, !uptodate, mapped failed attempt to read. contents may be
                                the same as the last data on disk or
                                it may be junk
1st stage, uptodate, !mapped hole. Contents is all-zeroes. It may also
                                be a result of failed attempt to map - we
                                have no way to tell.
1st stage, uptodate, mapped data. Same as on-disk or newer.
2nd stage, !uptodate, !mapped contents is same or newer than on disk,
                                mapping unknown. [Recipe for disaster,
                                since the current code may try to read it;
                                should be map-and-be-done-with-that]
2nd stage, !uptodate, mapped failed attempt to read that should never
                                happen. [Bug]
2nd stage, uptodate, !mapped hole. Contents may be nonzero due to
                                access via mmap(). May be a result of
                                failed attempt to map [we don't handle
                                such errors]
2nd stage, uptodate, mapped data. Same as on disk or newer.

        Something should be done about that - aside of forced uptodate on
the second stage we ought to flag the errors (possibly by BH_Error?)
        In __block_read_full_page() page being uptodate is a bug - we are
going to lose data.
        Additionally, we have "tied by IO" state - can happen only for
mapped uptodate bh.

We seriously rely on the fact that on stage 1 we drop buffer ring
only if the page has no data in need of write. We can't do that on stage 2
(mmap() and dirtying the page from processes).
Due to the above we can afford re-reading the data on stage 1.
However, I'ld rather see a bitmap (PAGE_CACHE_SIZE/blocksize bits)
describing what is not up-to-date. Then we could use it for deciding
what should not be read.

Comments?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: Chris Mason: "Re: The INN/mmap bug"
Previous message: J.A. Magallon: "init/main.c do-while bug still there ?"
In reply to: Linus Torvalds: "Re: The INN/mmap bug"
Next in thread: Linus Torvalds: "Re: The INN/mmap bug"
Reply: Linus Torvalds: "Re: The INN/mmap bug"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 21:00:18 EST