Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - goingbeyond 4096 bytes

From: Chris Mason
Date: Wed Jan 22 2014 - 13:02:29 EST

On Wed, 2014-01-22 at 09:21 -0800, James Bottomley wrote:
+AD4- On Wed, 2014-01-22 at 17:02 +-0000, Chris Mason wrote:

+AFs- I like big sectors and I cannot lie +AF0-

+AD4- +AD4- I really think that if we want to make progress on this one, we need
+AD4- +AD4- code and someone that owns it. Nick's work was impressive, but it was
+AD4- +AD4- mostly there for getting rid of buffer heads. If we have a device that
+AD4- +AD4- needs it and someone working to enable that device, we'll go forward
+AD4- +AD4- much faster.
+AD4- Do we even need to do that (eliminate buffer heads)? We cope with 4k
+AD4- sector only devices just fine today because the bh mechanisms now
+AD4- operate on top of the page cache and can do the RMW necessary to update
+AD4- a bh in the page cache itself which allows us to do only 4k chunked
+AD4- writes, so we could keep the bh system and just alter the granularity of
+AD4- the page cache.

We're likely to have people mixing 4K drives and +ADw-fill in some other
size here+AD4- on the same box. We could just go with the biggest size and
use the existing bh code for the sub-pagesized blocks, but I really
hesitate to change VM fundamentals for this.

>From a pure code point of view, it may be less work to change it once in
the VM. But from an overall system impact point of view, it's a big
change in how the system behaves just for filesystem metadata.

+AD4- The other question is if the drive does RMW between 4k and whatever its
+AD4- physical sector size, do we need to do anything to take advantage of
+AD4- it ... as in what would altering the granularity of the page cache buy
+AD4- us?

The real benefit is when and how the reads get scheduled. We're able to
do a much better job pipelining the reads, controlling our caches and
reducing write latency by having the reads done up in the OS instead of
the drive.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at