Re: Big I/O requests are split into small ones due to unaligned ext4 partition boundary?
From: Ming Lei
Date: Thu Dec 15 2016 - 07:43:20 EST
On Thu, Dec 15, 2016 at 7:47 PM, Dexuan Cui <decui@xxxxxxxxxxxxx> wrote:
> Hi, when I run "mkfs.ext4 /dev/sdc2" in a Linux virtual machine on Hyper-V,
> where a disk IOPS=500 limit is applied by me , the command takes much
> more time, if the ext4 partition boundary is not properly aligned:
> Example 1 : it takes ~7 minutes with average wMB/s = 0.3 (slow)
> Example 2 : it takes ~3.5 minutes with average wMB/s = 0.6 (slow)
> Example 3 : it takes ~0.5 minute with average wMB/s = 4 (expected)
> strace shows the mkfs.ext3 program calls seek()/write() a lot and most of
> the writes use 32KB buffers (this should be big enough), and the program
> only invokes fsync() once, after it issues all the writes -- the fsync() takes
>>99% of the time.
> By logging SCSI commands, the SCSI Write(10) command is used here for the
> userspace 32KB write:
> in example 1, *each* command writes 1 or 2 sectors only (1 sector = 512 bytes);
> in example 2, *each* command writes 2 or 4 sectors only;
> in example 3, each command writes 1024 sectors.
> It looks the kernel block I/O layer can somehow split big user-space buffers
> into really small write requests (1, 2, and 4 sectors)?
> This looks really strange to me.
> Note: in my test, this strange issue happens to 4.4 and the mainline 4.9 kernels,
> but the stable 3.18.45 kernel doesn't have the issue, i.e. all the 3 above test
> examples can finish in ~0.5 minute.
> Any comment?
I remember that we discussed this kind of issue, please see the discussion
and check if the patch can fix your issue.
> -- Dexuan
>  The max IOPS are measured in 8KB increments, meaning the max
> throughput is 8KB * 500 = 4000KB.
>  This is the partition info of my 20GB disk:
> # fdisk -l /dev/sdc
> Disk /dev/sdc: 20 GiB, 21474836480 bytes, 41943040 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disklabel type: dos
> Disk identifier: 0x00000000
> Device Boot Start End Sectors Size Id Type
> /dev/sdc1 1 14281784 14281784 6.8G 82 Linux swap / Solaris
> /dev/sdc2 14281785 41929649 27647865 13.2G 83 Linux
> Here, start_sector = 14281785, end_sector = 41929649.
>  start_sector = 14282752, end_sector = 41929649
>  start_sector = 14282752, end_sector = 41943039