Spurious disk change detections interferes with disk access
From: Jörg Henne
Date: Fri Apr 29 2005 - 15:34:19 EST
using a CompactFlash based system, I am experiencing a very weird
behaviour I just don't understand.
The system is based on
- a CompactFlash IDE drive attached to a real IDE controller (in
contrast to a PC-Card adapter)
- the root filesystem resides on the CF
While booting, there are irregular but frequent errors which manifest in
hda1: bad access: block=<something> count=<something>
which are followed by error messages from the filesystem and finally
errors in the boot process itself.
I added some more debug output to drivers/ide/ide-io.c and found that
while the block offset was well within the nominal bounds of the device,
the message was actually caused by the fact that while the error
occurred, the partition table was all zeroed-out, i.e.
drive->part[minor&PARTN_MASK].nr_sects == 0.
The reason why the drive's partition table is zeroed-out temporarily
seems to be that there is an ide_revalidate_disk() going on at that
time, which in turn is explained by messages like
VFS: Disk change detected on device 03:00
/dev/ide/host0/bus0/target0/lun0: p1 p2
which precede the "bad access" message.
As I see it, the core of the problem is:
- something accesses the devfs directory associated with the drive (ls
/dev/ide/host0/bus0/target0/lun0 will do)
- defvs runs a check_media_change which ends up at
ide-disk.c:idedisk_media_change() which is implemented like this:
/* if removable, always assume it was changed */
- devfs decides that a revalidate is appropriate since the media hash
changed, which trashes the partition table for a brief moment
- processes accessing the drive at that time see lots of errors
Is it really true, that for removable devices, a simple directory access
on the devfs can interfere with accesses to the drive? Sounds like a
great start for DOSing the system.
I see two possible solutions for the problem:
- A lock is put in place which guards access to the drive while the
partition table is re-built. Seems fairly hacky to me.
- The media change detection mechanism is improved, so that it can
detect when the media REALLY changed.
Thanks in advance for any input on this.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/