On Mon, 12 Jul 2010 02:49:55 -0700 (PDT)
david@xxxxxxx wrote:
On Mon, 12 Jul 2010, Takuya Yoshikawa wrote:
and RA returns the state to Pacemaker as it's already stopped.
(*2): Currently we are checking "shut off" answer from domstate command.
Yes, we should care about both SHUTOFF and CRASHED if possible.
4: Pacemaker finally tries to confirm if it can safely start failover by
sending stop command. After killing Qemu, RA replies to Pacemaker
"OK" so that Pacemaker can start failover.
Problems: We lose debuggable information of VM such as the contents of
guest memory.
the OCF interface has start, stop, status (running or not) or an error
(plus API info)
what I would do in this case is have the script notice that it's in
crashed status and return an error if it's told to start it. This will
cause pacemaker to start the service on another system.
I see.
So the key point is to how to check target, crashed in this case, status.
In the HA's point of view, we need that qemu guarantees:
- Guest never start again
- VM never modify external resources
But I'm not so sure if qemu currently guarantees such conditions in generic
manner.
Generically I agree that we always start the guest in another node for
failover. But are there any benefits if we can start the guest in the
same node?
if it's told to stop it, do whatever you can to save state, but definantly
pause/freeze the instance and return 'stopped'
no need to define some additional state. As far as pacemaker is concerned
it's safe as long as there is no chance of it changing the state of any
shared resources that the other system would use, so simply pausing the
instance will make it safe. It will be interesting when someone wants to
investigate what's going on inside the instance (you need to have it be
functional, but not able to use the network or any shared
drives/filesystems), but I don't believe that you can get that right in a
generic manner, the details of what will cause grief and what won't will
vary from site to site.
If we cannot say in a generic manner, we usually choose the most conservative
one: memory and ... perservation only.
What we concern the most is qemu actually guarantees the conditions we are
talking in this thread.
B. Our proposal: "introduce a new domain state to indicate failover-safe"
Pacemaker...(OCF)....RA...(libvirt)...Qemu
| | |
| | |
1: +---- start ----->+---------------->+ state=RUNNING
| | |
+---- monitor --->+---- domstate -->+
2: | | |
+<---- "OK" ------+<--- "RUNNING" --+
| | |
| | |
| | * Error: state=FROZEN
| | | Qemu releases resources
| | | and VM gets frozen. (*3)
+---- monitor --->+---- domstate -->+
3: | | |
+<-- "STOPPED" ---+<--- "FROZEN" ---+
| | |
+---- stop ------>+---- domstate -->+
4: | | |
+<---- "OK" ------+<--- "FROZEN" ---+
| | |
| | |
1: Pacemaker starts Qemu.
2: Pacemaker checks the state of Qemu via RA.
RA checks the state of Qemu using virsh(libvirt).
Qemu replies to RA "RUNNING"(normally executing), (*1)
and RA returns the state to Pacemaker as it's running correctly.
--- SOME ERROR HAPPENS ---
3: Pacemaker checks the state of Qemu via RA.
RA checks the state of Qemu using virsh(libvirt).
Qemu replies to RA "FROZEN"(VM stopped in a failover-safe state), (*3)
and RA keeps it in mind, then replies to Pacemaker "STOPPED".
(*3): this is what we want to introduce as a new state. Failover-safe means
that Qemu released the external resources, including some namespaces, to
be
available from another instance.
it doesn't need to release the resources. It just needs to not be able to
modify them.
pacemaker on the host won't try to start another instance on the same
host, it will try to start an instance on another host. so you don't need
to worry about releaseing memory, file locks, etc locally. for remote
resources you _can't_ release them gracefully if you crash, so your apps
already need to be able to handle that situation. there's no difference to
the other instances between a machine that gets powered off via STONITH
and a virtual system that gets paused.
Can't pacemaker start another instance on the same host by configuration?
Of course I agree that it may not be valuable in most situations.