Re: Change PAGE_SIZE from minimum 4k to 12k
From: Kevin McKinney
Date: Fri May 26 2017 - 09:15:10 EST
On Thu, May 25, 2017 at 5:55 PM, Pavel Machek <pavel@xxxxxx> wrote:
> Hi!
>
>> >> > Would it be possible to have a custom block device driver read/write
>> >> > in increments of 12k instead of reading/writing data in 4k increments?
>> >> > In other words, I would like to change the default page size on a
>> >> > x86_64 platform (4.4.0 kernel) from 4k to 12k as the minimum page
>> >> > size? I understand I may have negative performance due to
>> >> > fragmentation. Any help would be appreciated.
>> >> >
>> >> > If this is the wrong mailing list, please let me know the right one to use.
>> >>
>> >> I won't say "no" but ammount of work neccessary is likely measured in
>> >> man-years. Plus, hardware page size _is_ 4KB.
>> >
>> > Or a few other much larger sizes. Not that it actually matters. You can
>> > implement a larger software page size for a platform but it would still
>> > neeed to be a power of two, and you'd have trouble running some existing
>> > binaries for x86.
>> >
>> > What problem are you *actually* trying to solve ?
>>
>> Thanks for responding! I work for a company that created custom
>> hardware with 4 banks of drives. Each bank is 12 terabytes; and each
>> bank is controlled by a separate RAID controller. We created a custom
>> block device driver that is responsible for moving data to each bank.
>> The RAID controller will then stripe the data across the appropriate
>> disks for the specified bank. The problem we are having is by moving
>> in increments of 4k, we are unable to utilize all 48 terabytes; we are
>> only able to utilize 32 terabytes. If we could move in increments of
>> 12K that would allow us to use the full 12 terabytes for each bank.
>
> 12TB is not that big.. are we talking spinning rust or something
> special?
>
> I mean, what does it have to do with page size? 48TB device, that's 5
> SATA drives... that's not even that big.
>
> Pavel
Yes Pavel, you are right. But, the original idea was; if we could get
the kernel block layer to emit 6K or 12K aligned blocks sizes,
then we could more easily stripe the data across the 12 drives in each
bank. Essentially, we would move more data per I/O.
Since then, we have decided to solve this problem using a different
approach; in the hardware (FPGA). Thanks for responding, and
I apologize for wasting your time.
-Kevin
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html