Re: is killing zombies possible w/o a reboot?

From: Gene Heskett
Date: Wed Nov 03 2004 - 15:43:38 EST


On Wednesday 03 November 2004 15:13, Helge Hafting wrote:
>On Wed, Nov 03, 2004 at 11:24:19AM -0500, Gene Heskett wrote:
[...]
>> Lets just say that I think having to reboot because of a zombie
>> that has resources locked up, and have the reboot fubared by it
>> too, aren't exactly friendly actions.
>
>Did you try logging out from the graphical user interface,
>and then logging in again?

It took around 2 minutes for the logout of X to get back to a VC.
So obviously something slowed it down as thats a 4 second operation
here normally. And it didn't surprise me when the "reboot" shutdown
hung on "stopping alsasound" and I had to use the reset button.
[...]
>> I fully realise that linux has a much more complex method of
>> allocating resources, but doesn't it *know* exactly what resources
>> have been passed out to each process?
>
>Yes it does - the problem is that not all resources are managed
>by processes. Some allocations are managed by drivers, so a driver
>bug can get the device into a unuseable state _and_ tie up the
>process(es) that were using the driver at the moment.

This from my viewpoint, is wrong. The kernel, and only the kernel
should be ultimately responsible for handing out resources, and
reclaiming at its convienience.

>> And why is there no entry from the kill function into that
>> resource management portion of the kernel so that this could also
>> be done by the linux kernel, say with a "kill --total procnumber"?
>
>Interesting, but you might need a path from "kill" into
>every device driver. :-/ And of course it wtill won't work
>if there is a bug in the driver.

Thats the fault of the design IMO.

>> Seems like a heck of a good question to me since an os written to
>> run on a 64k machine in 1981, and expanded to run on a 128K to 2
>> megabyte machine in 1986 can do it just fine. Even if that
>> process is still running and spitting out data to its parent
>> window/shell! Or if its crashed and scribbled over all its
>> memory, makes no difference to os9. You (root) wants it gone,
>> fine, its gone.
>
>Can os9 do this if the process is busy calling into a buggy
>device driver that simply doesn't return or perhaps believes
>that some dma operation into process memory is taking forever?
>Or perhaps os9 doesn't have lots and lots of drivers written by
>different people with varying competence?

It did have quite a few authors involved in it over the years
including me, I did many of its utilities, and converted the rbf.mn
from 6809 code to 6309 code, roughly doubleing its speed without
fiddling with the clock speed, which is married to the video on that
machine. I also did a couple of its clock modules, which are the
heart of the multitasking it does. And yes, it could kill,
absolutely cleanly, any process you named on the command line at any
time. Any drivers involved got their scratch space from the callers
loading of a set of pointers, so if a driver was being accessed by 2
or more processes, each instance had its own stack/process space.
When the process disappeared, the recovery included that space in
memory. The driver proper had no long term history of that processes
actions, even if a disk seek microsleep or similar was in progress
when the caller disappeared.

>Often, the real solution is to fix the driver to deal with
>"unexpected" conditions.
>
>Helge Hafting

As I said earlier, lets let this horse be buried, "its dead Jim", and
my beating on it is only wasting bandwitdh.

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.28% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/