Re: [PATCH RFC 01/14] block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler

From: Paolo Valente
Date: Tue Mar 14 2017 - 07:28:37 EST



> Il giorno 06 mar 2017, alle ore 21:46, Jens Axboe <axboe@xxxxxxxxx> ha scritto:
>
> On 03/05/2017 09:02 AM, Paolo Valente wrote:
>>
>>> Il giorno 05 mar 2017, alle ore 16:16, Jens Axboe <axboe@xxxxxxxxx> ha scritto:
>>>
>>> On 03/04/2017 09:01 AM, Paolo Valente wrote:
>>>> We tag as v0 the version of BFQ containing only BFQ's engine plus
>>>> hierarchical support. BFQ's engine is introduced by this commit, while
>>>> hierarchical support is added by next commit. We use the v0 tag to
>>>> distinguish this minimal version of BFQ from the versions containing
>>>> also the features and the improvements added by next commits. BFQ-v0
>>>> coincides with the version of BFQ submitted a few years ago [1], apart
>>>> from the introduction of preemption, described below.
>>>>
>>>> BFQ is a proportional-share I/O scheduler, whose general structure,
>>>> plus a lot of code, are borrowed from CFQ.
>>>
>>> I'll take a closer look at this in the coming week.
>>
>> ok
>>
>>> But one quick
>>> comment - don't default to BFQ. Both because it might not be fully
>>> stable yet, and also because the performance limitation of it is
>>> quite severe. Whereas deadline doesn't really hurt single queue
>>> flash at all, BFQ will.
>>>
>>
>> Ok, sorry. I was doubtful on what to do, but, to not bother you on
>> every details, I went for setting it as default, because I thought
>> people would have preferred to test it, even from boot, in this
>> preliminary stage. I reset elevator.c in the submission, unless you
>> want me to do it even before receiving your and others' reviews.
>
> I don't think it's stable enough for that yet, it's seen very little
> testing outside of your own testing.

Hi Jens.

Yes, sorry, in general people of course use a kernel for a little bit
more than just testing bfq ...

> Given that, it's much better that
> people opt in to testing BFQ, so they at least know they can expect
> crashes.
>
> Speaking of testing, I ran into this bug:
>
> [ 9469.621413] general protection fault: 0000 [#1] PREEMPT SMP
> [ 9469.627872] Modules linked in: loop dm_mod xfs libcrc32c bfq_iosched x86_pkg_temp_thermal btrfe
> [ 9469.648196] CPU: 0 PID: 2114 Comm: kworker/0:1H Tainted: G W 4.11.0-rc1+ #249
> [ 9469.657873] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
> [ 9469.666742] Workqueue: xfs-log/nvme4n1p1 xfs_buf_ioend_work [xfs]
> [ 9469.674213] task: ffff881fe97646c0 task.stack: ffff881ff13d0000
> [ 9469.681053] RIP: 0010:__bfq_bfqq_expire+0xb3/0x110 [bfq_iosched]
> [ 9469.687991] RSP: 0018:ffff881fff603dd8 EFLAGS: 00010082
> [ 9469.694052] RAX: 6b6b6b6b6b6b6b6b RBX: ffff883fe8e0eb58 RCX: 0000000000010004
> [ 9469.702251] RDX: 0000000000010004 RSI: 0000000000000000 RDI: 00000000ffffffff
> [ 9469.710456] RBP: ffff881fff603de8 R08: 0000000000000000 R09: 0000000000000001
> [ 9469.718659] R10: 0000000000000001 R11: 0000000000000000 R12: ffff883fe8dbf4e8
> [ 9469.726863] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000007904
> [ 9469.735063] FS: 0000000000000000(0000) GS:ffff881fff600000(0000) knlGS:0000000000000000
> [ 9469.744539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9469.751189] CR2: 00000000020c0018 CR3: 0000001fec1db000 CR4: 00000000003406f0
> [ 9469.759392] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 9469.767596] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 9469.775794] Call Trace:
> [ 9469.778748] <IRQ>
> [ 9469.781222] bfq_bfqq_expire+0x104/0x2f0 [bfq_iosched]
> [ 9469.787193] ? bfq_idle_slice_timer+0x2a/0xc0 [bfq_iosched]
> [ 9469.793650] bfq_idle_slice_timer+0x7c/0xc0 [bfq_iosched]
> [ 9469.799914] __hrtimer_run_queues+0xd9/0x500
> [ 9469.804911] ? bfq_rq_enqueued+0x340/0x340 [bfq_iosched]
> [ 9469.811072] hrtimer_interrupt+0xb0/0x200
> [ 9469.815781] local_apic_timer_interrupt+0x31/0x50
> [ 9469.821264] smp_apic_timer_interrupt+0x33/0x50
> [ 9469.826555] apic_timer_interrupt+0x90/0xa0
>
> just running the xfstest suite. It's test generic/299. Have you done
> full runs of xfstest? I'd greatly recommend that for shaking out bugs.
> Run a full loop with xfs, one with btrfs, and one with ext4 for better
> confidence in the stability of the code.
>

Thanks for this information and suggestions. I'm running xfstests
right now, to reproduce at least the failure you spotted.

Thanks,
Paolo

> (gdb) l *__bfq_bfqq_expire+0xb3
> 0x5983 is in __bfq_bfqq_expire (block/bfq-iosched.c:2664).
> 2659 * been properly deactivated or requeued, so we can safely
> 2660 * execute the final step: reset in_service_entity along the
> 2661 * path from entity to the root.
> 2662 */
> 2663 for_each_entity(entity)
> 2664 entity->sched_data->in_service_entity = NULL;
> 2665 }
> 2666
> 2667 static void bfq_deactivate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> 2668 bo