Badblocks and no free pages...
4 May 1997 07:43:32 -0000

I'm running 2.0.30 on a 32 Meg Triton chipset system with two IDE
drives. (Which I've recently configured to be on the different buses,
one sharing with a seldom-used CD-ROM drive.)

Anyway, someone, and I forget who, suggested that four concurrent
badblocks invocations (one for each quarter of the disk) were a great
way to shake out general I/O flakiness. So I fired it up on my
secondary 2G disk (/dev/hdc), and then RTFM'd and realized what the "-w"
flag meant. Oops. Good think I picked the harmless disk to trash.

Y'know, a check (like fsck) for a mounted file system (or an unmounted
one, for that matter!) would be *really* nice.

Anyway, badblocks writes patterns (0xAA, 0x55, 0xFF and 0x00) to the
disk and the re-reads them, checking for errors. Four running in parallel
slows down the disk, since it has to seek so much and builds up long I/O
queues and really gives the I/O subsystem a hell of a workout.

A few things I noticed. First, that is a *very* good way to elicit
"Couldn't get a free page....." messages. I bumped /proc/sys/vm/freepages
from the default 64/96/128 to 128/192/256 and it reduced, but did
not eliminate these messages.

Second, while writing, the system is *amazingly* sluggish. I watched
xterm refresh its window at a vertical speed of a few pixels per second.
The mouse doesn't move. It takes a minute to switch consoles.
It takes seconds to log in on a text console.

This doesn't happen while reading back to verify, only while writing.
Does anyone have any idea what's going on? Is the system clogged with
write-behind buffers and thrashing, or something?

Can I report the following as bugs:
- The fact that badblocks doesn't help prevent accidents is a bit unfortunate.
- The fact that it generates "Couldn't get a free page" seems bad.
In particular, why should this happen during writing? What needs to
do an atomic page allocation?
- The unusable sluggishness of the machine is a bug.