Re: Swap space map bad

Paul Wouters (paul@xtdnet.nl)
Wed, 5 Mar 1997 11:45:21 +0100 (MET)


On Tue, 4 Mar 1997, Solitude wrote:

> It seems either my machine has a hardware problem or I have found a bug in
> the 2.0.x kernel series. I have been using 2.0.28 for the past month, and
> have been using 2.0.0 before that. The problem that I am having is that
> the machine will, for no apparent reason, begin to report
>
> Hmm.. Trying to use unallocated swap (00003300)
> swap_free: swap-space map bad (entry 00003300)
> swap_free: swap-space map bad (entry 00003300)
> swap_duplicate: trying to duplicate unused page
> swap_duplicate: trying to duplicate unused page
>
> These messages went streaming down the console making it totally unusable
> and wrote about 1/2 a meg of these errors in syslog. The error persisted
> for about 5 minutes. When I tried to do just about anything on the system
> I got the error:
>
> /bin/sh: fork: Try again

You were lucky to be there. I wasn't there when my machine started
to utter the same messages. It started with:

Mar 4 13:30:01 bean kernel: Unable to load interpreter
Mar 4 14:41:09 bean kernel: Unable to load interpreter
Mar 4 14:41:09 bean last message repeated 3 times
Mar 4 14:45:01 bean kernel: Unable to load interpreter
Mar 4 14:45:01 bean kernel: Unable to load interpreter
Mar 4 15:16:52 bean kernel: swap_duplicate: trying to duplicate unused page
Mar 4 15:17:38 bean kernel: swap_duplicate: trying to duplicate unused page
Mar 4 15:18:16 bean last message repeated 8 times
Mar 4 15:20:02 bean kernel: swap_duplicate: trying to duplicate unused page
Mar 4 15:20:51 bean last message repeated 5 times
Mar 4 15:23:22 bean kernel: Hmm.. Trying to use unallocated swap (00003700)

I was still around when this happened. I noticed a 'problem' because my
shared libraries couldn't be loaded anymore, and programs I started
couldn't run. This time i didn't get the fork() error, but I've had
that happen to me as well. This went away after a few seconds.
I left and when I came back a few hours later the machine had rebooted
(Probably by watchdog not being able to fork, no logentrywas made).
In the logs I had more of the same:

Mar 4 17:03:09 bean kernel: swap_duplicate: trying to duplicate unused page
Mar 4 17:03:10 bean kernel: swap_duplicate: trying to duplicate unused page
Mar 4 17:03:10 bean kernel: Hmm.. Trying to use unallocated swap (00003700)
Mar 4 17:05:00 bean kernel: swap_duplicate: trying to duplicate unused page
Mar 4 17:05:00 bean kernel: Hmm.. Trying to use unallocated swap (00003700)
[ ... ]
Mar 4 17:46:40 bean kernel: Unable to load interpreter
Mar 4 17:47:42 bean last message repeated 7 times
Mar 4 17:48:41 bean last message repeated 4 times
Mar 4 17:49:54 bean last message repeated 2 times
Mar 4 17:50:15 bean kernel: Unable to load interpreter
Mar 4 17:59:40 bean kernel: Kernel logging (proc) started.
Mar 4 17:59:40 bean kernel: Console: 16 point font, 400 scans

So it seems like it got shot by watchdog around 17:51. This machine is
a file server, so I guess it was busy filechecking almost 10gig until
17:59 when it became fully functional again.

Other logfiles were not so revealing, mail log shows a bunch of:

Mar 4 17:13:32 bean sendmail[2088]: NOQUEUE: SYSERR(root): SMTP-MAIL: died on signal 11

Then:

Mar 4 18:00:12 bean sendmail[260]: starting daemon (8.8.5): SMTP+queueing@00:30:00

> Despite this error I was able to log in as root and run shutdown. I don't
> understand this because the login process, the ability to start a shell,
> and the ability for that shell to fork off to the shutdown binary should
> have been impossible if nothing could fork. I couldn't even get ls to
> work. I have had these problems with both 2.0.0 and 2.0.28. This has
> happened about four times in the last year, but three of those times were
> within the last month and a half. If someone could tell me I have bad
> hardware, or has a patch for swap_free() or swap_duplicate() in the kernel
> I would be very appriceative.

I've had this happen a few times in the last months. However this is
the first time it resulted in a crash. The machines was upgraded a
few months ago to an Asus motherboard with P120. I am not sure if
that started the problems. The swap disk is on a scsi device, which
caused some problems with the scsi DAT tape before, so I have disabled
all the ncr options like disconnect/reconnect. After this email, I
will turn off the swap file and rerun mkswap on it just in case ;)

Does anyone know what one might be able to do to track this one down
if it happens again?

Paul

I am running 2.0.29