Re: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

From: Ming Lei
Date: Wed Nov 27 2019 - 08:09:20 EST


On Wed, Nov 27, 2019 at 10:39:40AM +0100, Andrea Vai wrote:
> Il giorno mer, 27/11/2019 alle 10.05 +0800, Ming Lei ha scritto:
> >
> >
> > It can be workaround via the following change:
> >
> > /lib/modules/5.4.0+/build/include/generated/autoconf.h:
> >
> > //#define CONFIG_CC_HAS_ASM_INLINE 1
>
> Thanks, it worked, trace attached. Produced by: start the trace script
> (with the pendrive already plugged), wait some seconds, run the test
> (1 trial, 1 GB), wait for the test to finish, stop the trace.
>
> The copy took 2659 seconds, roughly as already seen before.

Thanks for collecting the log.

>From the log, some of write IOs are out-of-order, such as, the 1st one
is 378880.

16.41240 2 266 266 kworker/2:1H block_rq_issue b'W' 370656 240
16.41961 3 485 485 kworker/3:1H block_rq_issue b'W' 378880 240
16.73729 2 266 266 kworker/2:1H block_rq_issue b'W' 370896 240
17.71161 2 266 266 kworker/2:1H block_rq_issue b'W' 379120 240
18.02344 2 266 266 kworker/2:1H block_rq_issue b'W' 371136 240
18.94314 3 485 485 kworker/3:1H block_rq_issue b'W' 379360 240
19.25624 2 266 266 kworker/2:1H block_rq_issue b'W' 371376 240

IO latency is increased a lot since the 1st out-of-order request(usb
storage HBA is single queue depth, one request can be issued only if
the previous issued request is completed).

The reason is that there are two kind of tasks which inserts rq to device.
One is the 'cp' process, the other is kworker/u8:*. The out-of-order
happens during the two task's interleaving.

Under such situation, I believe that the old legacy IO path may not
guarantee the order too. In blk_queue_bio(), after get_request()
allocates one request, the queue lock is released. And request is
actually inserted & issued from blk_flush_plug_list() under the
branch of 'if (plug)'. If requests are from two tasks, then request
is inserted/issued from two plug list, and no order can be guaranteed.

In my test, except for several requests from the beginning, all other
requests are inserted via the kworker thread(guess it is writeback wq),
that is why I can't observe the issue in my test.

As Schmid suggested, you may run the same test on old kernel with
legacy io path, and see if the performance is still good.

Also, could you share the following info about your machine? So that
I can build my VM guest in this setting for reproducing your situation
(requests are inserted from two types of threads).

- lscpu
- free -h
- lsblk -d $USB_DISK
- exact commands for mount the disk, and running the copy operation

Thanks,
Ming