Re: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

From: Ming Lei
Date: Tue Nov 26 2019 - 04:16:09 EST


On Tue, Nov 26, 2019 at 08:46:07AM +0100, Andrea Vai wrote:
> Il giorno mar, 26/11/2019 alle 10.32 +0800, Ming Lei ha scritto:
> > On Mon, Nov 25, 2019 at 07:51:33PM +0100, Andrea Vai wrote:
> > > Il giorno lun, 25/11/2019 alle 23.15 +0800, Ming Lei ha scritto:
> > > > On Mon, Nov 25, 2019 at 03:58:34PM +0100, Andrea Vai wrote:
> > > >
> > > > [...]
> > > >
> > > > > What to try next?
> > > >
> > > > 1) cat /sys/kernel/debug/block/$DISK/hctx0/flags
> > > result:
> > >
> > > alloc_policy=FIFO SHOULD_MERGE|2
> > >
> > > >
> > > >
> > > > 2) echo 128 > /sys/block/$DISK/queue/nr_requests and run your
> > copy
> > > > 1GB
> > > > test again.
> > >
> > > done, and still fails. What to try next?
> >
> > I just run 256M cp test
>
> I would like to point out that 256MB is a filesize that usually don't
> trigger the issue (don't know if it matters, sorry).

OK.

I tested 256M because IO timeout is often triggered in case of
qemu-ehci, and it is a long-term issue. When setting up the disk
via xhci-qemu, the max request size is increased to 1MB from 120KB,
and IO pattern changed too. When the disk is connected via uhci-qemu,
the transfer is too slow(1MB/s) because max endpoint size is too small.

However, I just waited 16min and collected all the 1GB IO log by
connecting disk over uhci-qemu, but the sector of each data IO
is still in order.

>
> Another info I would provide is about another strange behavior I
> noticed: yesterday I ran the test two times (as usual with 1GB
> filesize) and took 2370s, 1786s, and a third test was going on when I
> stopped it. Then I started another set of 100 trials and let them run
> tonight, and the first 10 trials were around 1000s, then gradually
> decreased to ~300s, and finally settled around 200s with some trials
> below 70-80s. This to say, times are extremely variable and for the
> first time I noticed a sort of "performance increase" with time.

The 'cp' test is buffered IO, can you reproduce it every time by
running copy just after fresh mount on the USB disk?

>
> > to one USB storage device on patched kernel,
> > and WRITE data IO is really in ascending order. The filesystem is
> > ext4,
> > and mount without '-o sync'. From previous discussion, looks that is
> > exactly your test setting. The order can be observed via the
> > following script:
> >
> > #!/bin/sh
> > MAJ=$1
> > MIN=$2
> > MAJ=$(( $MAJ << 20 ))
> > DEV=$(( $MAJ | $MIN ))
> > /usr/share/bcc/tools/trace -t -C \
> > 't:block:block_rq_issue (args->dev == '$DEV') "%s %d %d", args-
> > >rwbs, args->sector, args->nr_sector'
> >
> > $MAJ & $MIN can be retrieved via lsblk for your USB storage disk.
> >
> > So I think we need to check if the patch is applied correctly first.
> >
> > If your kernel tree is managed via git,
> yes it is,
>
> > please post 'git diff'.
> attached. Is it correctly patched? thanks.

Yeah, it should be correct except for the change on __blk_mq_delay_run_hw_queue()
is duplicated.

>
>
> > Otherwise, share us your kernel version,
> btw, is 5.4.0+
>
> > and I will send you one
> > backported patch on the kernel version.
> >
> > Meantime, you can collect IO order log via the above script as you
> > did last
> > time, then send us the log.
>
> ok, will try; is it just required to run it for a short period of time
> (say, some seconds) during the copy, or should I run it before the
> beginning (or before the mount?), and terminate it after the end of
> the copy? (Please note that in the latter case a large amount of time
> (and data, I suppose) would be involved, because, as said, to be sure
> the problem triggers I have to use a large file... but we can try to
> better understand and tune this. If it can help, you can get an ods
> file with the complete statistic at [1] (look at the "prove_nov19"
> sheet)).

The data won't be very big, each line covers 120KB, and ~10K line
is enough for cover 1GB transfer. Then ~300KB compressed file should
hold all the trace.


Thanks,
Ming