Agreed. I've seen this kind of hang mainly if some process is killed
while in the kernel (bad kernel pointer dereference or similar), and
leaving a buffer on the wrong queue.
> I've seen this mainly on the IDE disk (e.g. moving stuff to the SCSI
> disk often alleviates the problem), but I coulda sworn I saw it once or
> twice when I've been just banging on the SCSI disk.
>
> The system doesn't *completely* hang; I can do things that don't use
> the affected drive. Anything that does (including a sync) hangs forever.
>
> Hitting SHIFT+SCROLLOCK when this happens reveals that in every case
> there is exactly *one* buffer that's locked... so I think there's either
> a deadlock or some code path that's not releasing a buffer when it
> should... I'm not sure this is an Alpha-specific problem either...
It probably _isn't_ alpha-specific. It might just show up more clearly
on the alpha for some reason.
(well, it _could_ be due to differences in irq handling or something
like that on the alpha, but I don't think so).
> I backtracked through David M-T's wonderful collection of prebuilt
> kernels; the problem doesn't appear in 1.3.27 but does appear in 1.3.31
> and later (I've tried all the way through .36). I suppose I could
> look at the diffs, but I was wondering if anyone had any ideas off
> the top of their head...
Well, 1.3.28 changed the internal representation of a "device number"
(so that the kernel internally uses a "kdev_t" rather than the "dev_t").
It also had some other cleanups in device handling, that may or may not
have had problems. The code _should_ be equivalent with the 1.3.27
code, though.
1.3.29 changed the "mem_map[]" to be a structure, but that shouldn't
matter.
1.3.30 and 1.3.31 shouldn't have changed the buffer handling, although
there might have been driver changes.
I haven't seen the behaviour you mention: I'm using a pre-1.3.38 kernel
on my Cabriolet, and I've been using 1.3.3x kernels on this machines the
whole time. But I have to admit that I haven't used the floppy much,
and maybe what I've been doing can't be called heavy-duty (mostly
compilations while under X11 etc). I certainly haven't seen the
problems.
I'll try to come up with some idea, but as I'll be away to Romania for
the rest of the week starting early tomorrow I suspect I can't much
help. If you can reasonably easily repeat this, can you tell excatly
which kernel it is that starts showing the problem?
Linus