Re: Failover Kernel

From: david
Date: Thu Mar 05 2009 - 20:10:34 EST


On Wed, 4 Mar 2009, Tarkan Erimer wrote:

On 03/03/2009 05:29 AM, David Newall wrote:
It sounds like you want everything to just continue running. I don't

Yes, exactly. Backup kernel will take control when a crush occured without need a reboot or halt.
see how that can be done. All of those in-kernel tables and structures
would need to be migrated, and it follows, because there was a crash,
that any of them might have been corrupted. Worse, you want this to
save you when you try running a new kernel which crashes, and being a
new kernel, it follows that any of those structures could be different;
it might not be possible to create equivalent structures for different
kernel versions.


Yes, that's right and it's the first thing needed to overcome. Maybe, it could be implemented like this :

- Primary kernel could be 2.6.x or 2.6.x.y (2.6.28 or 2.6.28.1)
- Backup kernel could be one of these .y fix releases only: Like 2.6.28.5

So; when they're from the same version, it will prevent kernel API and structure changes.
For resuming by backup kernel: The primary kernel could write a journal about the needed things for backup to resume. Like process IDs, memory and process situations etc. The same manner as the Journalled File Systems did (they write a journal what they did to recover/resume at crash/disaster time).

wrong, kernel structures can change in any patch. they can even change with different configuration options.

but even if they are the same version and configuration options, that doesn't address the fact that you can't trust the in-kernel structures because they may have been damaged by whatever caused the crash.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/