Re: Back to the future.

From: Nigel Cunningham
Date: Thu Apr 26 2007 - 15:56:51 EST


On Thu, 2007-04-26 at 09:56 -0700, Linus Torvalds wrote:
> On Thu, 26 Apr 2007, Nigel Cunningham wrote:
> >
> > * Doing things in the right order? (Prepare the image, then do the
> > atomic copy, then save).
> I'd actually like to discuss this a bit..
> I'm obviously not a huge fan of the whole user/kernel level split and
> interfaces, but I actually do think that there is *one* split that makes
> sense:
> - generate the (whole) snapshot image entirely inside the kernel
> - do nothing else (ie no IO at all), and just export it as a single image
> to user space (literally just mapping the pages into user space).
> *one* interface. None of the "pretty UI update" crap. Just a single
> system call:
> void *snapshot_system(u32 *size);
> which will map in the snapshot, return the mapped address and the size
> (and if you want to support snapshots > 4GB, be my guest, but I suspect
> you're actually *better* off just admitting that if you cannot shrink
> the snapshot to less than 32 bits, it's not worth doing)

That inherently limits the image to half of available ram (you need
somewhere to store the snapshot), so you won't get the full image you
express interest in below.

> User space gets a fully running system, with that one process having that
> one image mapped into its address space. It can then compress/write/do
> whatever to that snapshot.

You're describing uswsusp! (At least in so far as I understand it!).

You can't get a fully running system though, because if anything changes
something on disk that was snapshotted (super blocks etc) your snapshot
is invalid and you risk on-disk corruption.

> And btw, the device model changes are a big part of this. Because I don't
> think it's even remotely debuggable with the full suspend/resume of the
> devices being part of generating the image! That freeze/snapshot/unfreeze
> sequence is likely a lot more debuggable, if only because freeze/unfreeze
> is actually a no-op for most devices, and snapshotting is trivial too.
> Once you have that snapshot image in user space you can do anything you
> want. And again: you'd hav a fully working system: not any degradation
> *at*all*. If you're in X, then X will continue running etc even after the
> snapshotting, although obviously the snapshotting will have tried to page
> a lot of stuff out in order to make the snapshot smaller, so you'll likely
> be crawling.

Nooooooo! See above about disk corruption.

> > * Mulithreaded I/O (might as well use multiple cores to compress the
> > image, now that we're hotplugging later).
> > * Support for > 1 swap device.
> > * Support for ordinary files.
> > * Full image option.
> > * Modular design?
> I'd really suggest _just_ the "full image". Nothing else is probably ever
> worth supporting. Your "snapshot to disk" wouldn't be _quite_ as simple as
> "echo disk > /sys/power/state", but it should not necessarily be much
> worse than

Please, go apply that logic elsewhere, then cut out (or at least stop
adding) support for users with less common needs in other areas. I fully
acknowledge that most users have only one place to store their image and
it's a swap device. But that doesn't mean one size fits all.

A full image implies that you need to figure out what's not going to
change while you're writing it and save that separately. At the moment,
I'm treating most of the LRU contents as that list. If we're going to
start trying to let every man and his dog run while we're trying to
snapshot the system, that's not going to work anymore - or the logic
will get a lot more complicated.

Sorry. I never thought I'd say this, but I think you're being naive
about how simple the process of snapshotting a system is.



Attachment: signature.asc
Description: This is a digitally signed message part