Re: Noticeable usability weakness of ext4 recovery, especially during boot
From: Andreas Mohr
Date: Sat Mar 26 2016 - 13:04:05 EST
Hello,
On Sun, Mar 20, 2016 at 05:51:23PM -0400, Theodore Ts'o wrote:
> On Sun, Mar 20, 2016 at 06:10:53PM +0100, Andreas Mohr wrote:
> > [ 18.164828] EXT4-fs (sda2): mounting ext3 file system using the ext4 subsyste
> > [ 248.211176] EXT4-fs (sda2): recovery complete
> > [ 249.595977] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts:
> > (null)
> >
> > (note the HUGE gap in dmesg timestamps above)
> >
> > All that *may* have been acceptable, if it were not for the fact
> > that especially during boot
> > certain activity progress is not readily visible,
> > thus there's a *lot* of confusion of what is [not] happening with the system
> > (as strongly opposed to a situation where one is mounting an ext4 partition
> > directly via the shell, where it's quite obvious that mount is hanging
> > right at the shell prompt,
> > and diagnostic verification via dmesg etc. is readily reachable,
> > as opposed to system boot).
>
> I'm curious --- what kind of removable USB drive is your sda drive?
Bus 001 Device 002: ID 0951:1665 Kingston Technology
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x0951 Kingston Technology
idProduct 0x1665
bcdDevice 1.00
iManufacturer 1 Kingston
iProduct 2 DataTraveler 2.0
Case label: DTSE9 64GB
USB 2.0 stick on USB 2.0 port.
Both hdparm -i and hdparm -I did not work, interestingly:
# hdparm -i /dev/sda
/dev/sda:
SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
HDIO_GET_IDENTITY failed: Invalid argument
# hdparm -I /dev/sda
/dev/sda:
SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ATA device, with non-removable media
Model Number: ïHïïïHïïïHïïHï ïHï@ïHï`ïHïïïHïïïH
Serial Number: ï`ïIï ïHïïïHïïHï ïH
Firmware Revision: ïHïïïHïï
Standards:
Likely used: 1
Configuration:
Logical max current
cylinders 0 0
heads 0 0
sectors/track 52736 0
--
bytes/track: 1021 bytes/sector: 0
Logical/Physical Sector size: 512 bytes
device size with M = 1024*1024: 0 MBytes
device size with M = 1000*1000: 0 MBytes
cache/buffer size = unknown
Capabilities:
IORDY not likely
Buffer type: 9340: dual port, multi-sector with read caching ability
Buffer size: 31396.0kB bytes avail on r/w long: 37728
Cannot perform double-word IO
R/W multiple sector transfer: not supported
DMA: not supported
PIO: pio0
(but these hdparm queries also fail when testing another USB stick -
on that machine at least,
so I'm not certain any more whether such behaviour is expected or not -
but I did sorta expect a plain IDENTIFY to work)
Just tested second USB stick on another machine - IDENTIFY failed, too
(also with sg_ident),
so that seems "normal" behaviour.
> Normally journal replay does *not* take nearly four minutes. It
> usually only takes at most a few seconds --- which is why we've not
> bothered to put in any kind of progress notification scheme. (Which
> on most Linux distribution would be covered by some boot graphics
> anyway, so it would be largely wasted.)
True, from recollection
for various devices
ext* recovery is a lot faster than a standard full fsck.
Makes it all the more mysterious why it now happened that way here.
> There *is* such a progress bar system for a file system consistency
> check, which normally is what can take minutes.
>
> > - progress may easily not be determinable via "out of band signalling" ;)
> > (completely silent USB stick rather than plain old metal platters HDD!!)
>
> Ah, it's some USB stick. I'm guessing it must be a *really* crappy
> USB stick that doesn't do random writes well at all. Even so, though
> this is quite unusual. This is much more normal:
>
> [8583.604009] sd 9:0:0:0: [sdc] Attached SCSI removable disk
> [8589.049603] EXT4-fs (sdc1): recovery complete
> [8589.055767] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)
>
> This is with an amazingly craptastic 1GB USB stick drive made of
> plastic which looks like it would fall apart if you twisted it
> slightly as you insert or remove it, and for which I paid less than
> $4.00 USD. (Basically something I treat as completely disposable. :-)
Hmm, yeah, it might have started to happen now
since device seems to be more in use now
(read: pristine factory status now gone, thus all blocks used,
thus it now actively needs to do erase block processing,
and if that has annoying weaknesses, then......).
So yeah, that might hint at annoyingly strong weaknesses
(annoyingly weak strongness ;)
in flash management implementation.
bonnie++ -s512 -r256 -d bonnie_test
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
mint 512M 110 72 5173 3 5721 1 704 98 +++++ +++ 91.0 1
Latency 6428ms 2545ms 2241ms 26638us 64us 7248us
Version 1.97 ------Sequential Create------ --------Random Create--------
mint -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 1624 7 +++++ +++ 3263 11 3676 19 +++++ +++ 10935 49
Latency 1061us 1950us 1766us 555us 257us 993us
1.97,1.97,mint,1,1458889676,512M,,110,72,5173,3,5721,1,704,98,+++++,+++,91.0,1,16,,,,,1624,7,+++++,+++,3263,11,3676,19,+++++,+++,10935,49,6428ms,2545ms,2241ms,26638us,64us,7248us,1061us,1950us,1766us,555us,257us,993us
I haven't had the "opportunity" to do another reboot yet,
thus I couldn't check whether behaviour is now different.
> > Thus, the question here would be
> > which parts may need to be sufficiently improved
> > to achieve an acceptable level of usability.
> > I'd think that even simply adding a message
> > that one is about to enter a potentially longer-term operation
> > (with a details message such as "recovery")
> > rather than "standard" mount activity
> > would be sufficient,
> > both for text boot *and* GUI boot with some sufficiently accessible
> > "boot details" window.
>
> Well, we could, but on most devices, it's largely wasted. Perhaps the
> more interesting question is why is your USB stick so slow? Because
> that's going to have a much larger usability issue for you...
Hmm, ok, I guess I could accept
not having a pre-recovery announcement done
if recovery is fully intended to be rather fast always.
Still, in my case status reporting then ended up sufficiently irritating,
due to rather unexpected behaviour (probably of this stick).
> Cheers,
>
> - Ted
Thank you!
(and apologies for the large delay in replying)
Andreas Mohr
--
GNU/Linux. It's not the software that's free, it's you.