Re: [RFC, PATCH 0/2] Reworking seeky detection for 2.6.34

From: Corrado Zoccolo
Date: Mon Mar 01 2010 - 18:01:27 EST

Next message: Paul Mundt: "Re: [RFC] microblaze: Support FRAME_POINTER for better backtrace"
Previous message: H. Peter Anvin: "Re: use of setjmp/longjmp in x86 emulator."
In reply to: Vivek Goyal: "Re: [RFC, PATCH 0/2] Reworking seeky detection for 2.6.34"
Next in thread: Corrado Zoccolo: "Re: [RFC, PATCH 0/2] Reworking seeky detection for 2.6.34"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Vivek,
On Mon, Mar 1, 2010 at 5:35 PM, Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> On Sat, Feb 27, 2010 at 07:45:38PM +0100, Corrado Zoccolo wrote:
>>
>> Hi, I'm resending the rework seeky detection patch, together with
>> the companion patch for SSDs, in order to get some testing on more
>> hardware.
>>
>> The first patch in the series fixes a regression introduced in 2.6.33
>> for random mmap reads of more than one page, when multiple processes
>> are competing for the disk.
>> There is at least one HW RAID controller where it reduces performance,
>> though (but this controller generally performs worse with CFQ than
>> with NOOP, probably because it is performing non-work-conserving
>> I/O scheduling inside), so more testing on RAIDs is appreciated.
>>
>
> Hi Corrado,
>
> This time I don't have the machine where I had previously reported
> regressions. But somebody has exported me two Lun from an storage box
> over SAN and I have done my testing on that. With this seek patch applied,
> I still see the regressions.
>
> iosched=cfq Â Â Filesz=1G Â bs=64K
>
> Â Â Â Â Â Â Â Â Â Â Â Â2.6.33 Â Â Â Â Â Â Â2.6.33-seek
> workload ÂSet NR ÂRDBW(KB/s) ÂWRBW(KB/s) ÂRDBW(KB/s) ÂWRBW(KB/s) Â Â%Rd %Wr
> -------- Â--- -- Â---------- Â---------- Â---------- Â---------- Â ---- ----
> brrmmap Â 3 Â 1 Â 7113 Â Â Â Â0 Â Â Â Â Â 7044 Â Â Â Â0 Â Â Â Â Â Â Â0% 0%
> brrmmap Â 3 Â 2 Â 6977 Â Â Â Â0 Â Â Â Â Â 6774 Â Â Â Â0 Â Â Â Â Â Â -2% 0%
> brrmmap Â 3 Â 4 Â 7410 Â Â Â Â0 Â Â Â Â Â 6181 Â Â Â Â0 Â Â Â Â Â Â-16% 0%
> brrmmap Â 3 Â 8 Â 9405 Â Â Â Â0 Â Â Â Â Â 6020 Â Â Â Â0 Â Â Â Â Â Â-35% 0%
> brrmmap Â 3 Â 16 Â11445 Â Â Â 0 Â Â Â Â Â 5792 Â Â Â Â0 Â Â Â Â Â Â-49% 0%
>
> Â Â Â Â Â Â Â Â Â Â Â Â2.6.33 Â Â Â Â Â Â Â2.6.33-seek
> workload ÂSet NR ÂRDBW(KB/s) ÂWRBW(KB/s) ÂRDBW(KB/s) ÂWRBW(KB/s) Â Â%Rd %Wr
> -------- Â--- -- Â---------- Â---------- Â---------- Â---------- Â ---- ----
> drrmmap Â 3 Â 1 Â 7195 Â Â Â Â0 Â Â Â Â Â 7337 Â Â Â Â0 Â Â Â Â Â Â Â1% 0%
> drrmmap Â 3 Â 2 Â 7016 Â Â Â Â0 Â Â Â Â Â 6855 Â Â Â Â0 Â Â Â Â Â Â -2% 0%
> drrmmap Â 3 Â 4 Â 7438 Â Â Â Â0 Â Â Â Â Â 6103 Â Â Â Â0 Â Â Â Â Â Â-17% 0%
> drrmmap Â 3 Â 8 Â 9298 Â Â Â Â0 Â Â Â Â Â 6020 Â Â Â Â0 Â Â Â Â Â Â-35% 0%
> drrmmap Â 3 Â 16 Â11576 Â Â Â 0 Â Â Â Â Â 5827 Â Â Â Â0 Â Â Â Â Â Â-49% 0%
>
>
> I have run buffered random reads on mmaped files (brrmmap) and direct
> random reads on mmaped files (drrmmap) using fio. I have run these for
> increasing number of threads and did this for 3 times and took average of
> three sets for reporting.
>
> I have used filesize 1G and bz=64K and ran each test sample for 30
> seconds.
>
> Because with new seek logic, we will mark above type of cfqq as non seeky
> and will idle on these, I take a significant hit in performance on storage
> boxes which have more than 1 spindle.

Thanks for testing on a different setup.
I wonder if the wrong part for multi-spindle is the 64kb threshold.
Can you run with larger bs, and see if there is a value for which
idling is better?
For example on a 2 disk raid 0 I would expect that a bs larger than
the stripe will still benefit by idling.

>
> So basically, the regression is not only on that particular RAID card but
> on other kind of devices which can support more than one spindle.
>
> I will run some test on single SATA disk also where this patch should
> benefit.
>
> Based on testing results so far, I am not a big fan of marking these mmap
> queues as sync-idle. I guess if this patch really benefits, then we need
> to first put in place some kind of logic to detect whether if it is single
> spindle SATA disk and then on these disks, mark mmap queues as sync.
>
> Apart from synthetic workloads, in practice, where this patch is helping you?

The synthetic workload mimics the page fault patterns that can be seen
on program startup, and that is the target of my optimization. In
2.6.32, we went the direction of enabling idling also for seeky
queues, while 2.6.33 tried to be more friendly with parallel storage
by usually allowing more parallel requests. Unfortunately, this
impacted this peculiar access pattern, so we need to fix it somehow.

Thanks,
Corrado

>
> Thanks
> Vivek
>
>
>> The second patch changes the seeky detection logic to be meaningful
>> also for SSDs. A seeky request is one that doesn't utilize the full
>> bandwidth for the device. For SSDs, this happens for small requests,
>> regardless of their location.
>> With this change, the grouping of "seeky" requests done by CFQ can
>> result in a fairer distribution of disk service time among processes.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Paul Mundt: "Re: [RFC] microblaze: Support FRAME_POINTER for better backtrace"
Previous message: H. Peter Anvin: "Re: use of setjmp/longjmp in x86 emulator."
In reply to: Vivek Goyal: "Re: [RFC, PATCH 0/2] Reworking seeky detection for 2.6.34"
Next in thread: Corrado Zoccolo: "Re: [RFC, PATCH 0/2] Reworking seeky detection for 2.6.34"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]