Re: AW: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

From: Theodore Y. Ts'o
Date: Tue Dec 10 2019 - 21:42:11 EST


On Tue, Dec 10, 2019 at 04:05:50PM +0800, Ming Lei wrote:
> > > The path[2] is expected behaviour. Not sure path [1] is correct,
> > > given
> > > ext4_release_file() is supposed to be called when this inode is
> > > released. That means the file is closed 4358 times during 1GB file
> > > copying to usb storage.
> > >
> > > [1] insert requests when returning to user mode from syscall
> > >
> > > b'blk_mq_sched_request_inserted'
> > > b'blk_mq_sched_request_inserted'
> > > b'dd_insert_requests'
> > > b'blk_mq_sched_insert_requests'
> > > b'blk_mq_flush_plug_list'
> > > b'blk_flush_plug_list'
> > > b'io_schedule_prepare'
> > > b'io_schedule'
> > > b'rq_qos_wait'
> > > b'wbt_wait'
> > > b'__rq_qos_throttle'
> > > b'blk_mq_make_request'
> > > b'generic_make_request'
> > > b'submit_bio'
> > > b'ext4_io_submit'
> > > b'ext4_writepages'
> > > b'do_writepages'
> > > b'__filemap_fdatawrite_range'
> > > b'ext4_release_file'
> > > b'__fput'
> > > b'task_work_run'
> > > b'exit_to_usermode_loop'
> > > b'do_syscall_64'
> > > b'entry_SYSCALL_64_after_hwframe'
> > > 4358

I'm guessing that your workload is repeatedly truncating a file (or
calling open with O_TRUNC) and then writing data to it. When you do
this, then when the file is closed, we assume that since you were
replacing the previous contents of a file with new contents, that you
would be unhappy if the file contents was replaced by a zero length
file after a crash. That's because ten years, ago there were a *huge*
number of crappy applications that would replace a file by reading it
into memory, truncating it, and then write out the new contents of the
file. This could be a high score file for a game, or a KDE or GNOME
state file, etc.

So if someone does open, truncate, write, close, we still immediately
writing out the data on the close, assuming that the programmer really
wanted open, truncate, write, fsync, close, but was too careless to
actually do the right thing.

Some workaround[1] like this is done by all of the major file systems,
and was fallout the agreement from the "O_PONIES"[2] controversy.
This was discussed and agreed to at the 2009 LSF/MM workshop. (See
the "rename, fsync, and ponies" section.)

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
[2] https://blahg.josefsipek.net/?p=364
[3] https://lwn.net/Articles/327601/

So if you're seeing a call to filemap_fdatawrite_range as the result
of a fput, that's why.

In any case, this behavior has been around for a decade, and it
appears to be incidental to your performance difficulties with your
USB thumbdrive and block-mq.

- Ted