Re: The INN/mmap bug

From: Daniel Phillips (news-innominate.list.linux.kernel@innominate.de)
Date: Tue Sep 19 2000 - 10:42:33 EST


Alexander Viro wrote:
>
> On Tue, 19 Sep 2000, Daniel Phillips wrote:
>
> > The more I think about it the less clear and ambiguous I find it.
> > When you add the dirty bit into the pot you get:
> >
> > Mapped, Uptodate, Dirty: not possible
>
> Sure, it is possible - that's how the write happens
>
> > !Mapped, Uptodate, Dirty: not possible
> > Mapped, !Uptodate, Dirty: pending write
>
> s/pending write/obvious bug/, damnit. Daniel, just think for a second: we
> have the buffer read to be picked by bdflush and written to disk, while
> the _contents_ _is_ _not_ _uptodate_. Just what can you expect when write
> succeeds? Junk on disk, right? You know, GIGO queue - garbage in, garbage
> out...
>
> > !Mapped, !Uptodate, Dirty: pending map and write
>
> Wrong. It's an instant BUG at line 711 in ll_rw_blk.c - remember these
> reports?

It wasn't a conceptual error, it was a typo: here's the corrected
table with the wrongly tagged states you noticed
interchanged:

         Mapped, Uptodate, Dirty: pending write
        !Mapped, Uptodate, Dirty: not possible
         Mapped, !Uptodate, Dirty: not possible
        !Mapped, !Uptodate, Dirty: pending map and write
         Mapped, Uptodate, !Dirty: regular block
        !Mapped, Uptodate, !Dirty: hole of zeroes
         Mapped, !Uptodate, !Dirty: unread
        !Mapped, !Uptodate, !Dirty: pending map
 
> Sure, the dirty bit is not orthogonal to the rest. You don't need to do
> any complex analysis - it's as simple as
> * if I don't know _where_ to write the data - I'ld better not feed
> the request to ll_rw_block(), or it may get PO'd
> * if I know that data is junk - I don't want it hitting the disk.
>
> Dirty bit == request fed into the funnel and can be on disk any moment now.
> Locked == already in IO subsystem.
>
> That's it - completely independent from the rest, except that you don't
> want the whole write mechanism applied to non-uptodate or non-mapped
> pieces.

Right, two obvious bugs, which is the same thing as saying two uneeded
states. Now, I'm just trying to be tidy and fit this all into a nice
regular model that I can represent with state transition diagrams. I
don't know for sure why it's good to do that, but it seems good. I
like to imagine that if I could just get them all down on paper in a
regular form some possible optimizations would just jump right out at
me. Mind you, this is not necessary for the filesystem work I'm
doing, this is more like a side interest. I get along just fine with
the existing mechanism.

> Think about IO as a memory bus - cache controller deals with writing the
> cache line to RAM, but you don't want it to try that on lines that don't
> have PA already calculated or have invalid contents. "mark dirty" == point
> the write-behind mechanism to it and let it decide when the thing must be
> written.

I'd also like it to be able to decide on its own when to map the
block. If I'm fully replacing the contents of a buffer on a page I
would like to be able to just mark the buffer dirty without mapping it
and let bdflush map it later. Right now it doesn't work because
bdflush tries to feed a null block to ll_rw_block, but a small change
would fix that: the flush daemon just has to notice the buffer is on a
page, then it can call get_block to map it. Does this accomplish
anything useful? I *think* so but I'm not sure. It seems to me that
if you handled this properly you could have, for example, a temporary
file written, read back and deleted, all without ever touching the
disk, or even going into get_block to check for mappings, let alone
fussing around with the allocation bitmaps.

I'm also trying to determine if a 'don't know' mapping state would be
useful for optimizing certain I/O paths.

> Think how to make the cache indexed by virtual address (not by
> physical, as in case of x86) work correctly. That's what pagecache is -
> software MMU with cache-by-VA architecture.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 21:00:20 EST