Re: X server and OOM kill

Jon M. Taylor (taylorj@ecs.csus.edu)
Wed, 12 Aug 1998 14:36:44 -0700 (PDT)


On Thu, 13 Aug 1998, David Luyer wrote:

>
> Another problem with X which cannot be fixed is out-of-memory killing
> and restoration of state.
>
> There's a bug which bites me every now and then
> (something to do with doing a 'find text' in netscape under X with
> afterstep and with both the "magic options" set using XF86_SVGA
> and a very large document, usually a Cisco CD document) which causes
> the X server (!!) process to bloat out and die (if I kill netscape
> while the machine is going mad from a remote telnet, the X server
> does not reclaim the memory either). This hasn't happened to me
> for about a month now, but that's quite possibly since I now avoid
> searching the documents in that way (I use grep then go to the
> location instead). Under afterstep I could fix the problem by
> removing what they called the "magic voodoo options" but windowmaker
> offers no fix.
>
> Anyway I agree this is a buggy application problem (where the
> applicatoin is XF86_SVGA being triggered by something in
> netscape) but my point is, if X is killed due to an out-of-memory
> situation, it gets zero, count them, zero chances to restore the
> video state. While it's not a normal situation, it can happen.

Precisely. And then what do you do? You *cannot* just kill the X
server. If you force a VT switchaway and then kill the X server that
might work, but that would require a special case in the kernel.

> The solution?
> 1) reliable X servers

The most reliable X server in the world still needs to be able to
avoid being killed during a critical section. It can't do that(?) without
kernel help.

> 2) video cards which you don't need to know the state for!

I.E., no SVGA-style hardware of any kind. You have to get into
workstation-level hardware to find intelligent designs that let you read
back all registers, interrupt accel commands in the middle, etc.

> If you are complaining about X dying for no apparent reason on
> a card which it is impossible for you to restore without knowing
> state, what you are saying is...
> "Linux should guard me against application bugs which my hardware
> bugs mean I can't get a userspace app to guard me against".

Right. This isn't always a bad idea, but when it starts to entail
ugly hacks or loss of efficiency or lots of coding, one begins to question
the wisdom of doing this.

> Even then that isn't quite true. You could store a "current
> state" in a SYSVSHM segment and whenever making a change which
> could put the system in a non-recoverable state record the new
> state there.

Someone else also write me with this idea. This solves that
"everyone must reset the card on VT switchaway" problem, but does nothing
to ensure the atomicity of the critical sections.

> ie; daemon sits watching xserver or watching nattach on a shm
> segment. xserver always keeps shm segment up to date with
> restoration instructions. when executing a critical state
> change, xserver writes to the segment giving intent to do this,
> does it, writes that it is done. if it dies when the segment
> does not indicate it's in the middle of a state change, the
> daemon just restores. if it dies in the middle of a state
> change, the daemon does something smart possibly requiring some
> kind of user input(?).

That last is the killer. It is basically an insoluble problem.
There is no way the daemon or the user can possible know how to finish up
all possible atomic state changes or accel commands. Once you enter such
a critical code section, you must go all the way through, though the
heavens fall. Anything else and you take the chance of encountering
problems.

> I'm not sure exactly. But this is userspace and at worst
> reduces it from an easy to cause problem to a race condition.

Like I said, this "football" ideaw where every userspace video
card driver hands off the current state to the others through a shared
memory segment is not a bad idea. Each program would have to be able to
parse the register data back into mosre abstract hardware settings
though, unless you also passed "hints" in the shared memory.

Jon

---
'Cloning and the reprogramming of DNA is the first serious step in 
becoming one with God.'
	- Scientist G. Richard Seed

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html