Re: designing fast, streaming disk i/o with mmap: help wanted

From: Jeff V. Merkey (jmerkey@timpanogas.com)
Date: Sun Apr 02 2000 - 18:28:03 EST


After watching this thread, perhaps what you should do is just implement
some code that does what you want to try to accomplish, and submit it to
folks to look at. These guys that do the Linux Kernel work I've noticed
are not always open to "other" ideas, but they always seem to be open to
"better" ideas.

Jeff

Paul Barton-Davis wrote:
>
> OK, the design story continue.
>
> Here are some numbers. Each line has this format:
>
> <file-id>: lock <num-bytes> (<start-byte> .. <end-byte>) = <msecs>
>
> Each <file-id> has been preallocated at 23040044 bytes, and is mmapped
> into the process address space.
>
> The lines show the amount of time that mlock(2) took to return when
> asked to lock the specified bytes into memory. Note than munlock(2) is
> active as well, so the total amount of memory being locked at any one
> time stays more or less active.
>
> For the most part, things look good. The performance is pretty much
> exactly what I would expect from other tests of my disk
> performance. The basic number to spot is roughly 16ms to mlock()
> (i.e. seek and read into the buffer cache) 64K of data. Sometimes, its
> a lot less (0.1msec or less) if the data is already there thanks to
> read-ahead or fortuitous hang-overs from prior runs. Here is an
> example:
>
> 0: lock 65536 (1015852 .. 1081388) = 12.94msecs
> 1: lock 65536 (1015852 .. 1081388) = 14.64msecs
> 2: lock 65536 (1015852 .. 1081388) = 18.80msecs
> 3: lock 65536 (1015852 .. 1081388) = 10.40msecs
> 4: lock 65536 (1015852 .. 1081388) = 10.67msecs
> 5: lock 65536 (1015852 .. 1081388) = 11.78msecs
> 6: lock 65536 (1015852 .. 1081388) = 17.26msecs
> 7: lock 65536 (1015852 .. 1081388) = 16.70msecs
> 8: lock 65536 (1015852 .. 1081388) = 10.41msecs
> 9: lock 65536 (1015852 .. 1081388) = 10.21msecs
> 10: lock 65536 (1015852 .. 1081388) = 11.94msecs
> 11: lock 65536 (1015852 .. 1081388) = 10.56msecs
> 12: lock 65536 (1015852 .. 1081388) = 18.21msecs
> 13: lock 65536 (1048620 .. 1114156) = 10.42msecs
> 14: lock 65536 (1048620 .. 1114156) = 13.99msecs
> 15: lock 65536 (1048620 .. 1114156) = 0.10msecs
> 16: lock 65536 (1048620 .. 1114156) = 0.09msecs
> 17: lock 65536 (1048620 .. 1114156) = 0.09msecs
> 18: lock 65536 (1048620 .. 1114156) = 0.09msecs
> 19: lock 65536 (1048620 .. 1114156) = 0.10msecs
> 20: lock 65536 (1048620 .. 1114156) = 0.08msecs
> 21: lock 65536 (1048620 .. 1114156) = 0.08msecs
> 22: lock 65536 (1048620 .. 1114156) = 0.09msecs
> 23: lock 65536 (1048620 .. 1114156) = 0.08msecs
>
> This is just fine. But as we move into the file a lot further, and
> really start doing full-time disk I/O, I start to see this:
>
> 0: lock 65536 (4948012 .. 5013548) = 13.33msecs
> 1: lock 65536 (4948012 .. 5013548) = 16.13msecs
> 2: lock 65536 (4948012 .. 5013548) = 14.79msecs
> 3: lock 65536 (4948012 .. 5013548) = 73.73msecs <<<< !!!
> 4: lock 98304 (4948012 .. 5046316) = 58.68msecs <<<< !!!
> 5: lock 98304 (4948012 .. 5046316) = 18.43msecs
> 6: lock 98304 (4948012 .. 5046316) = 17.48msecs
> 7: lock 98304 (4948012 .. 5046316) = 16.70msecs
> 8: lock 65536 (4980780 .. 5046316) = 10.31msecs
> 9: lock 65536 (4980780 .. 5046316) = 12.73msecs
> 10: lock 65536 (4980780 .. 5046316) = 18.65msecs
> 11: lock 65536 (4980780 .. 5046316) = 10.23msecs
> 12: lock 98304 (4980780 .. 5079084) = 10.06msecs
> 13: lock 98304 (4980780 .. 5079084) = 14.21msecs
> 14: lock 98304 (4980780 .. 5079084) = 12.60msecs
> 15: lock 98304 (4980780 .. 5079084) = 13.74msecs
> 16: lock 98304 (4980780 .. 5079084) = 12.91msecs
> 17: lock 98304 (4980780 .. 5079084) = 12.30msecs
> 18: lock 98304 (4980780 .. 5079084) = 12.43msecs
> 19: lock 98304 (4980780 .. 5079084) = 28.62msecs
> 20: lock 65536 (5013548 .. 5079084) = 12.53msecs
> 21: lock 65536 (5013548 .. 5079084) = 12.35msecs
> 22: lock 65536 (5013548 .. 5079084) = 11.75msecs
> 23: lock 65536 (5013548 .. 5079084) = 11.13msecs
>
> All of a sudden, some files are taking up to 5 times as long to read
> the data.
>
> So, now a word about these files. They were created on an otherwise
> empty, fresh, squeaky clean ext2 partition, #2 on the following disk
> (c/o fdisk):
>
> Disk /dev/sdb: 255 heads, 63 sectors, 554 cylinders
> Units = cylinders of 16065 * 512 bytes
>
> Device Boot Start End Blocks Id System
> /dev/sdb1 1 319 2562336 83 Linux
> /dev/sdb2 320 415 771120 83 Linux
> /dev/sdb3 416 511 771120 83 Linux
> /dev/sdb4 512 554 345397+ 83 Linux
>
> My assumptions were:
>
> 1) each file is contiguous (they were created serially)
> 2) the partition is as physically contiguous as it could be
>
> I therefore anticipated that a serial read through all the files would
> give rise to roughly uniform seek+read times. Instead, we have
> variances of up to 5 fold.
>
> So, my question:
>
> * why is the mlock/seek+read time so much larger ? what kinds of
> disk geometry issues or kernel buffer cache issues
> could give rise to this kind of behaviour ? is there anything
> i can do about it ?
>
> Needless to say, this sucks. The thread doing the reading has the task
> of keeping the far edge of the locked region a certain distance ahead
> of another continuously moving point being updated by another
> thread. If it takes a performance hit on one mlock() call, then later
> ones end up having to handle a longer segment of the file, and thus
> get progressively slower, so that eventually, the "read ahead" falls
> behind the "reader", and catastrophe ensues. This occurs regardless of
> how much read ahead I use, because once things start going downhill,
> they just keep sliding. Using more read ahead just means that it takes
> longer to hit bottom, so to speak.
>
> Finally, where do I read about rawio ? :)
>
> --p
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Apr 07 2000 - 21:00:09 EST