I'm not sure what you mean by "closed files". Either the app has a fd,It's common for an app to write a tmp file, close it, and then open it a bit later expecting to find the content it just wrote. If you checkpoint-kill it in the interim, reboot (clearing out /tmp) and then resume, then it will lose its tmp file. There's no explicit connection between the process and its potential working set of files.
it doesn't, or it is in sys_open() somewhere. We have to get the app
into a quiescent state before we can checkpoint, so we basically just
say that we won't checkpoint things that are *in* the kernel.
I respectfully disagree. The number one prerequisite for
checkpoint/restart is isolation. Xen just happens to get this for free.
So, instead of saying that there's no explicit connection between the
process and its working set, ask yourself how we make a connection.
In this case, we can do it with a filesystem (mount) namespace. Each
container that we might want to checkpoint must have its writable
filesystems contained to a private set that are not shared with other
containers. Things like union mounts would help here, but aren't
necessarily required. They just make it more efficient.
Is there anything specific you are thinking of that particularly worriesNo, that's the problem; it all worries me. It's a big problem space.
you? I could write pages on the list you have there.
It's almost as big of a problem as trying to virtualize entire machines
and expecting them to run as fast as native. :)
Cool! I didn't know you guys did the IRIX implementation. I'm sure you
guys got a lot farther than any of us are. Did you guys ever write any
papers or anything on it? I'd be interested in more information.