On Mon, 11 Jul 2016, Mark Hounschell wrote:
Well, all that was specified in my original post. I can no longer open the
floppy drive with no floppy media inserted. Worse, I can also no longer open a
floppy with media inserted that is not a "linux" recognized format. A floppy
drive is a removable media device and should be treated as such. The original
implementation of the O_NDELAY flag allowed it to be.
Any removable media device should be capable of being opened with no, or even
unrecognizable media installed. The kernel and its utilities should not
"assume" to much when it comes to removable media. Consider a SCSI tape drive
or even a removable media SCSI disk drive. How would you explain an open
failure to someone trying to open a SCSI tape drive that had no tape or even a
"non-tar" formatted tape media in it???
Or better yet, trying to open a removable media device the was write protected
but didn't include O_RDONLY in the open?
Alright, so you are basically supplementing O_NDELAY flag in order to
avoid check_disk_change() being called. It's rather a coincidence that it
has worked this way, but I agree with you that we can't ignore the fact
that there is userspace relying on this behavior.
The original behavior of the floppy driver was correct. I have no idea
what BUG these changes were supposed to fix but the "fix" obviously
broke user land. Was this bug reported by some new ROBOT test or
something? The kernel floppy driver has been stable for years now
That's not really true; the code is a racy mess, and this is being
uncovered only when virtualized floppy devices started to exist (because
they are much faster than a real hardware, and the different timing
reveals bugs that were not visible before).
This particular fix was because syzkaller found a way how easily corrupt
kernel memory using O_NDELAY to floppy driver; see
https://lkml.org/lkml/2016/2/2/848
so I am really confused as to why these changes were induced.
The floppy driver is in an orphan mode; no new "features" are added "just
because". Everything that's happening there is to fix real bugs in the
kernel.
I'll look into ways how to fix this, but I am afraid this is going to be
really tricky. Therefore we'd have to very likely proceed asap with revert
of 09954bad448 and coming up with a workaround that'd still avoid the bug
reported by syzkaller.