32GB SSD on USB1.1 P3/700 == ___HELL___ (2.6.34-rc3)

From: Andreas Mohr
Date: Sun Apr 04 2010 - 18:14:01 EST


[CC'd some lucky candidates]

Hello,

I was just running
mkfs.ext4 -b 4096 -E stride=128 -E stripe-width=128 -O ^has_journal
/dev/sdb2
on my SSD18M connected via USB1.1, and the result was, well,
absolutely, positively _DEVASTATING_.

The entire system became _FULLY_ unresponsive, not even switching back
down to tty1 via Ctrl-Alt-F1 worked (took 20 seconds for even this key
to be respected).

Once back on ttys, invoking any command locked up for minutes
(note that I'm talking about attempted additional I/O to the _other_,
_unaffected_ main system HDD - such as loading some shell binaries -,
NOT the external SSD18M!!).

Having an attempt at writing a 300M /dev/zero file to the SSD's filesystem
was even worse (again tons of unresponsiveness), combined with multiple
OOM conditions flying by (I/O to the main HDD was minimal, its LED was
almost always _off_, yet everything stuck to an absolute standstill).

Clearly there's a very, very important limiter somewhere in bio layer
missing or broken, a 300M dd /dev/zero should never manage to put
such an onerous penalty on a system, IMHO.


I've got SysRq-W traces of these lockup conditions if wanted.


Not sure whether this is a 2.6.34-rc3 thing, might be a general issue.

Likely the lockup behaviour is a symptom of very high memory pressure.
But this memory pressure shouldn't even be allowed to happen in the first
place, since the dd submission rate should immediately get limited by the kernel's
bio layer / elevators.

Also, I'm wondering whether perhaps additionally there are some cond_resched()
to be inserted in some places, to try to improve coping with such a
broken situation at least.

Thanks,

Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/