Dave Hansen wrote:I respectfully disagree. The number one prerequisite forI'm not sure what you mean by "closed files". Either the app has a fd,It's common for an app to write a tmp file, close it, and then open it a bit later expecting to find the content it just wrote. If you checkpoint-kill it in the interim, reboot (clearing out /tmp) and then resume, then it will lose its tmp file. There's no explicit connection between the process and its potential working set of files.
it doesn't, or it is in sys_open() somewhere. We have to get the app
into a quiescent state before we can checkpoint, so we basically just
say that we won't checkpoint things that are *in* the kernel.
checkpoint/restart is isolation. Xen just happens to get this for free.
(I don't have my Xen hat on at all for this thread.)
So, instead of saying that there's no explicit connection between the
process and its working set, ask yourself how we make a connection.
In this case, we can do it with a filesystem (mount) namespace. Each
container that we might want to checkpoint must have its writable
filesystems contained to a private set that are not shared with other
containers. Things like union mounts would help here, but aren't
necessarily required. They just make it more efficient.
We were dealing with checkpointing random sets of processes, and that posed all sorts of problems. Filesystem namespace was one, the pid namespace was another. Doing checkpointing at the container-level granularity definitely solves a lot of problems.
It's almost as big of a problem as trying to virtualize entire machinesIs there anything specific you are thinking of that particularly worriesNo, that's the problem; it all worries me. It's a big problem space.
you? I could write pages on the list you have there.
and expecting them to run as fast as native. :)
No, it's much harder. Hardware is relatively simple and immutable compared to kernel and process state ;)
Cool! I didn't know you guys did the IRIX implementation. I'm sure you
guys got a lot farther than any of us are. Did you guys ever write any
papers or anything on it? I'd be interested in more information.
Yeah, there was a paper, but it looks like the internet has lost it. It was at http://www.csu.edu.au/special/conference/apwww95/.papers95/cmaltby/cmaltby.ps
http://www.csu.edu.au/special/conference/apwww95/sept-all.html has mention of the paper.