Request for clarification on disk i/o limitations

From: Mark Hull-Richter
Date: Thu Apr 26 2007 - 20:01:45 EST

I'm studying the behavior of the page cache vs. direct i/o in the
kernel (so far up to 2.6.9-42.0.10 for CentOS 4.4, moving to 2.6.18
for CentOS 5 soon).

I have found that large I/O requests get broken up differently between
direct i/o and cached i/o (not surprising, but it's the way they're
broken up that puzzles me).

While both direct and cached i/o break the i/o requests down to page
sized chunks, these are subsequently merged back together until a
limit is reached. For direct i/o on my SCSI disks, that limit is 88
segments, or 704 sectors (352k of a 512k block), so when direct i/os
of 512k chunks are performed via dd, each 512k i/o usually gets broken
into one 704 sector chunk and one 320 sector chunk.

However, when this same kind of i/o is done through the page cache,
the pages also get chunked together, and each 704 sector chunk limits
the size of an i/o sent to the scheduler, but this time, the 704
sector chunks typically get merged into a 1408 sector chunk because
the limiting factor here is a maximum sectors per i/o size of 2048.

My question is: what is the relationship between the hardware segment
limit (enforced in ll_new_segment and __bio_add_page) of 88 pages per
segment and the hardware maximum sectors per i/o of 2048 (enforced by
ll_back_merge, and also ll_front_merge, though I haven't seen that one
happen yet)? Where do these limits come from and why don't they

(The other subject of my study is why EMC NAS devices take at least 2x
as long as SCSI "DAS" disks, and why paged i/o appears to be no more
than 1/2 the speed of direct i/o on both types of disk, specifically
for large i/os (>256k chunks.)

Please reply with CC to me directly as I am still not on the list (but
I will be soon.)


Mark Hull-Richter, Linux Kernel Engineer
DATAllegro (
85 Enterprise, Second Floor, Aliso Viejo, CA 92656
949-680-3082 - Office 949-330-7691 - fax
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at