Re: Shutdown causes fsck on USB disks

From: Theodore Ts'o
Date: Thu Mar 09 2023 - 15:19:37 EST


On Thu, Mar 09, 2023 at 09:05:08AM +0000, Chris Ward wrote:
> I have a (Ubuntu 22.04) system with a number of external USB disks.
> When I do a 'shutdown now' with these disks mounted, on the next start
> the disks have to be fsck-ed. So it seems that the disks are not
> unmounted cleanly on shutdown; maybe the disks report 'completion'
> before the data is really written, and the machine powers off before
> allowing the unmount writes to complete.
>
> Is this a kernel problem, or should I take it up with the Ubuntu maintainers ?

What file system do you have on these disks? And when you say that
they have to be fsck'ed, was this a fast fsck to just replay the
journal, or was it a longer running fsck to fix corruption? Does this
happen reliably, every single time when you run "shutdown now"? What
if you run "reboot now" and don't actually power off the machine
instead?

The reason why I ask these questions is that even if the machine was
powered off without allowing the unmount to complete, for file systems
with journals, the journal replay should allow the file system to be
mounted without requiring a full fsck, and the file system should not
have any corruption issues that would need to be fixed by fsck.

Assuming that you are using ext4, if the kernel had discovered some
kind of file system inconsistency, it will set an indication that file
system has issues that need to be fixed, and then on the next reboot,
a full fsck run will be triggered. You can check to see whether this
is the case by running dumpe2fs -h and checking the file system state.


# In a test kvm environment, mount a testing file system found on /dev/vdc
root@kvm-xfstests:~# mount /dev/vdc /vdc
[ 312.217838] EXT4-fs (vdc): mounted filesystem 43d49992-65a8-4fb4-b20e-f7c0682c2720 with ordered data mode. Quota mode: none.

# This is an example of how to trigger a file system corruption report
root@kvm-xfstests:~# echo testing > /sys/fs/ext4/vdc/trigger_fs_error
[ 315.186334] EXT4-fs error (device vdc): trigger_test_error:126: comm bash: testing

# Above is the sort of thing you will see on the console, and in your
# system logs, although if there is real (non-testing) file system problem
# detected by the kernel, then you might something like this instead:
# EXT-fs error (device vdc): __extf4_find_entry:1531 inode #4512: comm main: reading directory block 0


# And this is what you will see if you run dumpe2fs and check on the filesystem state
root@kvm-xfstests:~# dumpe2fs -h /dev/vdc | grep state
dumpe2fs 1.47.0 (5-Feb-2023)
Filesystem state: clean with errors

# And then when you run fsck, you'll see a message stating that the file system
# contains errors. In the case of a real corruption, you'll probably see fsck
# reporting that one or more inconsistencies will have been corrupted. Nothing
# is reported this time since the file system inconsistency was triggred as
# a synthetic test via /sys/fs/ext4/$DEVICE/trigger_fs_error
root@kvm-xfstests:~# umount /dev/vdc
[ 317.173855] EXT4-fs (vdc): unmounting filesystem 43d49992-65a8-4fb4-b20e-f7c0682c2720.
root@kvm-xfstests:~# fsck.ext4 -p /dev/vdc
/dev/vdc contains a file system with errors, check forced. <=========================
/dev/vdc: 15/327680 files (0.0% non-contiguous), 42399/1310720 blocks


All of this being said, I suspect the most likely cause is actually a
problem with your hardware. Note that file system inconsistencies
could be kernel bugs, but more commonly, they are caused by storage
failures due to flaky hardware --- for example, if you purchased cr*p
USB thumb drives at the bargain bin at the checkout aisle of your
local Micro Center, or cheap street vendor selling hardware that may
have fallen off the back of a truck in the back alleys of Shenzhen. :-)

So I always ask if you can replicate the problem on other systems and
using different storage devices.

- Ted