Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch

From: Tejun Heo
Date: Thu Nov 04 2010 - 09:07:24 EST


Hello,

On 11/04/2010 01:48 PM, Luck, Tony wrote:
>> If you think only about target processes, yeah sure, you can cover
>> most of the stuff but that's not the impossible part. What's not
>> defined is interaction with the rest of the system and userland.
>> Userland ecosystem is crazy complex. You simply cannot stop, say,
>> banshee or even pidgin, let it mingle with the rest of the system and
>> restore it later in any safe way.
>
> This is why I think it is important to define the limits of
> which kernel state features are covered (or going to be
> covered) by checkpoint/restart - and then list applications
> that are supported (Oren mentioned mysql server in this thread).
> It will always be easy for someone to point at some application
> like powertop and say "we can't migrate that, so checkpoint
> restart is therefore useless" ... this just is not true. This
> can be useful without having to be complete (as long as the
> limits are well defined).
>
>> I'm afraid I can't agree with that. You can store and restore the
>> states which kernel is aware of but that's a very small fraction of
>> the whole picture.
>
> See above - it may be enough to cover a significant number of
> useful cases.

I was arguing that it is far from being _generally_ useful or
transparent. If you're saying that it is something useful for certain
use cases and application, yeah, sure. I never argued against that.

>> I'm afraid that's not general or transparent at all. It's extremely
>> invasive to how a system is setup and used. It basically is poor
>> man's virtualization or rather partitioning without hardware support
>> and at this point I find it very difficult to justify the added
>> complexity. Let's just make virtualization better.
>
> I don't think that you'll ever make virtualization good enough
> to make the HPC people happy.

If you think about HPC, userland implementation is enough. In 99% of
cases, those programs just read and write data files and burn a lot of
CPU cycles. You don't need a lot of fancy stuff to do that. More
important things would be integrating with job management so that
snapshots and rollbacks can be automatically done.

I agree that CR would be very useful for certain use cases and
applications. I just can't see where the giant patchset fits between
userland implementation which seems enough for the the most common use
case of HPC and virtualization which is maturing fast.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/