Re: [linus:master] [block] e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression

From: Niklas Cassel
Date: Wed Jan 08 2025 - 05:39:43 EST


On Tue, Jan 07, 2025 at 04:27:44PM +0800, Oliver Sang wrote:
> hi, Niklas,
>
> On Fri, Jan 03, 2025 at 10:09:14AM +0100, Niklas Cassel wrote:
> > On Fri, Jan 03, 2025 at 07:49:25AM +0100, Christoph Hellwig wrote:
> > > On Thu, Jan 02, 2025 at 10:49:41AM +0100, Niklas Cassel wrote:
> > > > > > from below information, it seems an 'ahci' to me. but since I have limited
> > > > > > knowledge about storage driver, maybe I'm wrong. if you want more information,
> > > > > > please let us know. thanks a lot!
> > > > >
> > > > > Yes, this looks like ahci. Thanks a lot!
> > > >
> > > > Did this ever get resolved?
> > > >
> > > > I haven't seen a patch that seems to address this.
> > > >
> > > > AHCI (ata_scsi_queuecmd()) only issues a single command, so if there is any
> > > > reordering when issuing a batch of commands, my guess is that the problem
> > > > also affects SCSI / the problem is in upper layers above AHCI, i.e. SCSI lib
> > > > or block layer.
> > >
> > > I started looking into this before the holidays. blktrace shows perfectly
> > > sequential writes without any reordering using ahci, directly on the
> > > block device or using xfs and btrfs when using dd. I also started
> > > looking into what the test does and got as far as checking out the
> > > stress-ng source tree and looking at stress-aiol.c. AFAICS the default
> > > submission does simple reads and writes using increasing offsets.
> > > So if the test result isn't a fluke either the aio code does some
> > > weird reordering or btrfs does.
> > >
> > > Oliver, did the test also show any interesting results on non-btrfs
> > > setups?
> > >
> >
> > One thing that came to mind.
> > Some distros (e.g. Fedora and openSUSE) ship with an udev rule that sets
> > the I/O scheduler to BFQ for single-queue HDDs.
> >
> > It could very well be the I/O scheduler that reorders.
> >
> > Oliver, which I/O scheduler are you using?
> > $ cat /sys/block/sdb/queue/scheduler
> > none mq-deadline kyber [bfq]
>
> while our test running:
>
> # cat /sys/block/sdb/queue/scheduler
> none [mq-deadline] kyber bfq

The stddev numbers you showed is all over the place, so are we certain
if this is a regression caused by commit e70c301faece ("block:
don't reorder requests in blk_add_rq_to_plug") ?

Do you know if the stddev has such big variation for this test even before
the commit?


If it is not too much to ask... It might be interesting to know if we see
a regression when comparing before/after e70c301faece with scheduler none
instead of mq-deadline.


Kind regards,
Niklas