For a lot of applications this issue is actually fairly trivial to the
application. The secret is to push the problem into a transaction server
within or outside of a database. Then the application loop just adds
while(x=next_job())
{
if(transaction_completed(x))
continue;
if(transaction_began(x))
rollback(x)
do(x)
}
What this actually amounts to really is pushing all the really hard stuff
into the transaction server - cos you gotta replicate that too[1]. It makes
a very nice, but sometimes a bit heavyweight, approach to help end users
of such systems, and best of all its all userspace and libraries.
Of course someone has to write the replicated transaction server but there
is only one of them and if its right the apps should be ok.
Alan
[1] If you are serious about the fault tolerant side you can rule out shared
disk solutions, shared anything. There are no real shortcuts on this part
of the game
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu