Re: io.latency controller apparently not working

From: Paolo Valente
Date: Mon Aug 19 2019 - 12:41:17 EST

Next message: Naveen N. Rao: "Re: [PATCH 1/4] kprobes: adjust kprobe addr for KPROBES_ON_FTRACE"
Previous message: Marc Zyngier: "Re: [PATCH v2 4/9] KVM: arm64: Support stolen time reporting via shared structure"
In reply to: Paolo Valente: "Re: io.latency controller apparently not working"
Next in thread: Paolo Valente: "Re: io.latency controller apparently not working"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> Il giorno 16 ago 2019, alle ore 20:17, Paolo Valente <paolo.valente@xxxxxxxxxx> ha scritto:
>
>
>
>> Il giorno 16 ago 2019, alle ore 19:59, Josef Bacik <josef@xxxxxxxxxxxxxx> ha scritto:
>>
>> On Fri, Aug 16, 2019 at 07:52:40PM +0200, Paolo Valente wrote:
>>>
>>>
>>>> Il giorno 16 ago 2019, alle ore 15:21, Josef Bacik <josef@xxxxxxxxxxxxxx> ha scritto:
>>>>
>>>> On Fri, Aug 16, 2019 at 12:57:41PM +0200, Paolo Valente wrote:
>>>>> Hi,
>>>>> I happened to test the io.latency controller, to make a comparison
>>>>> between this controller and BFQ. But io.latency seems not to work,
>>>>> i.e., not to reduce latency compared with what happens with no I/O
>>>>> control at all. Here is a summary of the results for one of the
>>>>> workloads I tested, on three different devices (latencies in ms):
>>>>>
>>>>> no I/O control io.latency BFQ
>>>>> NVMe SSD 1.9 1.9 0.07
>>>>> SATA SSD 39 56 0.7
>>>>> HDD 4500 4500 11
>>>>>
>>>>> I have put all details on hardware, OS, scenarios and results in the
>>>>> attached pdf. For your convenience, I'm pasting the source file too.
>>>>>
>>>>
>>>> Do you have the fio jobs you use for this?
>>>
>>> The script mentioned in the draft (executed with the command line
>>> reported in the draft), executes one fio instance for the target
>>> process, and one fio instance for each interferer. I couldn't do with
>>> just one fio instance executing all jobs, because the weight parameter
>>> doesn't work in fio jobfiles for some reason, and because the ioprio
>>> class cannot be set for individual jobs.
>>>
>>> In particular, the script generates a job with the following
>>> parameters for the target process:
>>>
>>> ioengine=sync
>>> loops=10000
>>> direct=0
>>> readwrite=randread
>>> fdatasync=0
>>> bs=4k
>>> thread=0
>>> filename=/mnt/scsi_debug/largefile_interfered0
>>> iodepth=1
>>> numjobs=1
>>> invalidate=1
>>>
>>> and a job with the following parameters for each of the interferers,
>>> in case, e.g., of a workload made of reads:
>>>
>>> ioengine=sync
>>> direct=0
>>> readwrite=read
>>> fdatasync=0
>>> bs=4k
>>> filename=/mnt/scsi_debug/largefileX
>>> invalidate=1
>>>
>>> Should you fail to reproduce this issue by creating groups, setting
>>> latencies and starting fio jobs manually, what if you try by just
>>> executing my script? Maybe this could help us spot the culprit more
>>> quickly.
>>
>> Ah ok, you are doing it on a mountpoint.
>
> Yep
>
>> Are you using btrfs?
>
> ext4
>
>> Cause otherwise
>> you are going to have a sad time.
>
> Could you elaborate more on this? I/O seems to be controllable on ext4.
>
>> The other thing is you are using buffered,
>
> Actually, the problem is suffered by sync random reads, which always
> hit the disk in this test.
>
>> which may or may not hit the disk. This is what I use to test io.latency
>>
>> https://patchwork.kernel.org/patch/10714425/
>>
>> I had to massage it since it didn't apply directly, but running this against the
>> actual block device, with O_DIRECT so I'm sure to be measure the actual impact
>> of the controller, it all works out fine.
>
> I'm not getting why non-direct sync reads, or buffered writes, should
> be uncontrollable. As a trivial example, BFQ in this tests controls
> I/O as expected, and keeps latency extremely low.
>
> What am I missing?
>

While waiting for your answer, I've added also the direct-I/O case to
my test. Now we have also this new case reproduced by the command
line reported in the draft.

Even with direct I/O, nothing changes with writers as interferers,
apart from latency becoming at least equal to the case of no I/O
control for the HDD. Summing up, with writers as interferers (latency
in ms):

no I/O control io.latency BFQ
NVMe SSD 3 3 0.2
SATA SSD 3 3 0.2
HDD 56 56 13

In contrast, there are important improvements with the SSDs, in case
of readers as interferers. This is the new situation (latency still
in ms):

no I/O control io.latency BFQ
NVMe SSD 1.9 0.08 0.07
SATA SSD 39 0.2 0.7
HDD 4500 118 11

Thanks,
Paolo

> Thanks,
> Paolo
>
>> Thanks,
>>
>> Josef

Next message: Naveen N. Rao: "Re: [PATCH 1/4] kprobes: adjust kprobe addr for KPROBES_ON_FTRACE"
Previous message: Marc Zyngier: "Re: [PATCH v2 4/9] KVM: arm64: Support stolen time reporting via shared structure"
In reply to: Paolo Valente: "Re: io.latency controller apparently not working"
Next in thread: Paolo Valente: "Re: io.latency controller apparently not working"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]