Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch

From: Tejun Heo
Date: Thu Nov 04 2010 - 03:40:12 EST


Hello,

On 11/04/2010 02:47 AM, Nathan Lynch wrote:
>> In this case whitelisting the allowed
>> state by requiring special APIs for all I/O (or even just standard
>> APIs as long as they are supposed by the C/R lib you're linked against)
>> is the more pragmatic, and I think faithful aproach.
>
> I don't think users will go for it. They'll continue to use dodgy
> out-of-tree kernel modules and/or LD_PRELOAD hacks instead of porting
> their applications to a new library. I think a C/R library is an
> "ideal" solution, but it's one that nobody would use - especially in
> HPC, unless the library somehow provides better performance.

I hear that there are plans to integrate one of the userland
snapshotting implementations with HPC workload manager. ISTR the
combination to be condor + dmtcp but not sure. I think things like
that make a lot of sense. Scientists writing programs for HPC
clusters already work in given frameworks and what those applications
do and how to recover are pretty well confined/defined. If you
integrate snapshotting with such frameworks, it becomes pretty easy
for both the admins and users.

I'll talk about other issues in the reply to Oren's email.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/