Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch

From: Nathan Lynch
Date: Thu Nov 04 2010 - 16:54:47 EST


On Thu, 2010-11-04 at 08:36 +0100, Tejun Heo wrote:
> Hello,
>
> On 11/04/2010 02:47 AM, Nathan Lynch wrote:
> >> In this case whitelisting the allowed
> >> state by requiring special APIs for all I/O (or even just standard
> >> APIs as long as they are supposed by the C/R lib you're linked against)
> >> is the more pragmatic, and I think faithful aproach.
> >
> > I don't think users will go for it. They'll continue to use dodgy
> > out-of-tree kernel modules and/or LD_PRELOAD hacks instead of porting
> > their applications to a new library. I think a C/R library is an
> > "ideal" solution, but it's one that nobody would use - especially in
> > HPC, unless the library somehow provides better performance.
>
> I hear that there are plans to integrate one of the userland
> snapshotting implementations with HPC workload manager. ISTR the
> combination to be condor + dmtcp but not sure. I think things like
> that make a lot of sense.

If you look at the C/R implementations of those two projects you'll see
that they don't implement what I take to be hch's suggestion - a library
or platform with special-purpose APIs to which applications are ported
in order to gain C/R ability. For all their good points, the projects
you mention do interposition for glibc's syscall wrappers and provide a
few optional hooks so apps can control certain aspects of C/R.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/