Hi Stefan!
So this is interesting. We can see the card is 100% busy. The IO submitted
to the card is formed by small requests - 18-38 KB per request - and each
request takes 0.3-0.5s to complete. So the resulting throughput is horrible
- only tens of KB/s. Also we can see there are many IOs queued for the
device in parallel (aqu-sz columnt). This does not look like load I would
expect to be generated by download of a large file from the web.
You have mentioned in previous emails that with dd(1) you can do couple
MB/s writing to this card which is far more than these tens of KB/s. So the
file download must be doing something which really destroys the IO pattern
(and with mb_optimize_scan=0 ext4 happened to be better dealing with it and
generating better IO pattern). Can you perhaps strace the process doing the
download (or perhaps strace -f the whole rpi-update process) so that we can
see how does the load generated on the filesystem look like? Thanks!
Honza