Re: [PATCH] block: BFQ default for single queue devices
From: Ulf Hansson
Date: Thu Oct 04 2018 - 05:57:01 EST
On 3 October 2018 at 19:34, Bryan Gurney <bgurney@xxxxxxxxxx> wrote:
> On Wed, Oct 3, 2018 at 11:53 AM, Paolo Valente <paolo.valente@xxxxxxxxxx> wrote:
>>
>>
>>> Il giorno 03 ott 2018, alle ore 10:28, Linus Walleij <linus.walleij@xxxxxxxxxx> ha scritto:
>>>
>>> On Wed, Oct 3, 2018 at 9:42 AM Damien Le Moal <Damien.LeMoal@xxxxxxx> wrote:
>>>
>>>> There is another class of outliers: host-managed SMR disks (SATA and SCSI,
>>>> definitely single hw queue). For these, using mq-deadline is mandatory in many
>>>> cases in order to guarantee sequential write command delivery to the device
>>>> driver. Having the default changed to bfq, which as far as I know is not SMR
>>>> friendly (can sequential writes within a single zone be reordered ?) is asking
>>>> for troubles (unaligned write errors showing up).
>>>
>>> Ah, that is interesting.
>>>
>>> Which device driver files are we talking about here, specifically?
>>> I'd like to take a look.
>>>
>>> I guess what you say is not that you are looking for the deadline
>>> scheduling per se (as in deadline scheduling is nice), what you want is
>>> the zone locking semantics in that scheduler, is that right?
>>>
>>> I.e. this business:
>>> blk_queue_is_zoned(q)
>>> blk_req_zone_write_lock(rq);
>>> blk_req_zone_write_unlock(rq);
>>> and mq-deadline solves this with a spinlock.
>>>
>>> I will augment the patch to enforce mq-deadline
>>> if blk_queue_is_zoned(q) is true, as it is clear that
>>> any device with that characteristic must use mq-deadline.
>>>
>>> Paoly might be interested in looking into whether BFQ could
>>> also handle zoned devices in the future, I have no idea of how
>>> hard that would be.
>>>
>>
>> Absolutely, as I already wrote in my reply to Damien.
>>
>> In the meantime, Linus, augmenting your patch as you propose seems
>> a clean and effective solution to me.
>>
>> Thanks,
>> Paolo
>>
>>> The zoned business seems a bit fragile. Should it even be
>>> allowed to select any other scheduler than deadline on these
>>> devices? Presenting all compiled in schedulers in
>>> /sysblock/device/queue/scheduler sounds like just giving
>>> sysadmins too much rope.
>>>
>>> Yours,
>>> Linus Walleij
>>
>
> Right now, users of host-managed SMR drives should be using "deadline"
> or "mq-deadline", to avoid out-of-order writes in sequential-only
> zones.
>
> I'm running into a situation right now on a test system (Fedora 28,
> 4.18.7 kernel) where I copied test data onto an F2FS filesystem, but I
> accidentally forgot to add my "udev rule" file:
>
> # cat /etc/udev/rules.d/99-zoned-block-devices.rules
> ACTION=="add|change", KERNEL=="sd[a-z]",
> ATTRS{queue/zoned}=="host-managed", ATTR{queue/scheduler}="deadline"
>
> ...and now, I see these messages when that specific SMR drive is mounted:
>
> kernel: F2FS-fs (sdc): IO Block Size: 4 KB
> kernel: F2FS-fs (sdc): Found nat_bits in checkpoint
> kernel: F2FS-fs (sdc): Mounted with checkpoint version = 212216ab
> kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08),
> sub_code(0x0000)
> kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08),
> sub_code(0x0000)
> kernel: scsi_io_completion: 20 callbacks suppressed
> kernel: sd 7:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> kernel: sd 7:0:0:0: [sdb] tag#0 Sense Key : Aborted Command [current]
> kernel: sd 7:0:0:0: [sdb] tag#0 Add. Sense: No additional sense information
> kernel: sd 7:0:0:0: [sdb] tag#0 CDB: Write(16) 8a 00 00 00 00 00 3d d4
> ec 99 00 00 00 80 00 00
>
> I was also running into problems with creating new directories on this
> F2FS filesystem. However, "fsck.f2fs" reports no problems. So at
> this point, I created a new F2FS filesystem on a second SMR drive, and
> am currently copying the data from the "bad" F2FS filesystem to the
> "good" one.
>
> I wouldn't call zoned block devices "fragile"; they simply have I/O
> rules that didn't previously exist: all writes to sequential-only
> zones must be sequential. And one of the things that schedulers do is
> reorder writes. After 4.16, sd stopped being the "gatekeeper" of
> ensuring sequential writes, but the only "zoned-aware" schedulers were
> deadline and mq-deadline. Since my test system defaulted to "cfq", I
> ran into problems.
>
> So I welcome any changes that make it impossible for the user to
> "accidentally use the wrong scheduler".
I fully agree.
>
> At least this time, I didn't "brick" my test system's BIOS, like I did
> back in May of this year [1].
It sounds to me that the kernel isn't doing its job. In particular,
the kernel have the information, as to be able to select the proper
I/O scheduler (the block layer could just check
BLK_ZONE_TYPE_SEQWRITE_REQ/ZBC_ZONE_TYPE_SEQWRITE_REQ). Instead it
relies on userspace to do the right thing, it can't be right.
Kind regards
Uffe