Re: [PATCH 0/7][v8] Container-init signal semantics

From: Daniel Lezcano
Date: Thu Feb 19 2009 - 09:59:45 EST

Next message: Alan Stern: "Re: [RFD] Automatic suspend"
Previous message: Ingo Molnar: "Re: [SCSI][REGRESSION][BISECTED] Disk errors loop forever in 2.6.29"
In reply to: Eric W. Biederman: "Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary"
Next in thread: Oleg Nesterov: "Re: [PATCH 0/7][v8] Container-init signal semantics"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Sukadev Bhattiprolu wrote:

Patch 5/7 is new in this set and fixes a bug. Remaining patches are
just a forward-port from previous version and I believe they address
all comments I have received.

Oleg please sign-off/ack if you agree.

---

Container-init must behave like global-init to processes within the
container and hence it must be immune to unhandled fatal signals from
within the container (i.e SIG_DFL signals that terminate the process).

But the same container-init must behave like a normal process to processes in ancestor namespaces and so if it receives the same fatal
signal from a process in ancestor namespace, the signal must be
processed.

Implementing these semantics requires that send_signal() determine pid
namespace of the sender but since signals can originate from workqueues/
interrupt-handlers, determining pid namespace of sender may not always
be possible or safe.

This patchset implements the design/simplified semantics suggested by
Oleg Nesterov. The simplified semantics for container-init are:

- container-init must never be terminated by a signal from a
descendant process.

- container-init must never be immune to SIGKILL from an ancestor
namespace (so a process in parent namespace must always be able
to terminate a descendant container).

- container-init may be immune to unhandled fatal signals (like
SIGUSR1) even if they are from ancestor namespace. SIGKILL/SIGSTOP
are the only reliable signals to a container-init from ancestor
namespace.

Hi Suka,

I agree with these semantics, they look good.

What is planned to have the init process to die when a system container shuts down ?

Let's say we use the "shutdown" command, it will telinit to go to the runlevel 0, and will kill -1.
At this point, the container finishes with a sys_reboot (we take care to do nothing otherwise the real system shuts down). But the init process will stay there and the launcher of the container will never know if the container has stopped or not.

Gregory Kurz proposed a solution:
* when shutdown is called and we are not in the init pidns, then we kill the process 1 of the pidnamespace.
* when reboot is called and we are not in the init pidns, then we reexec the init process, using the same command line. I guess this one could be easily retrieved if we are able to display /proc/1/cmdline ;)

IMHO, this is a good proposition because it is generic and intuitive, no ?

What do you thing ?

-- Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alan Stern: "Re: [RFD] Automatic suspend"
Previous message: Ingo Molnar: "Re: [SCSI][REGRESSION][BISECTED] Disk errors loop forever in 2.6.29"
In reply to: Eric W. Biederman: "Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary"
Next in thread: Oleg Nesterov: "Re: [PATCH 0/7][v8] Container-init signal semantics"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]