Re: clone3: allow creation of time namespace with offset

From: Adrian Reber
Date: Wed Mar 25 2020 - 03:58:54 EST


On Tue, Mar 24, 2020 at 06:56:49PM +0100, Christian Brauner wrote:
> On Tue, Mar 24, 2020 at 05:25:46PM +0100, Adrian Reber wrote:
> > On Tue, Mar 24, 2020 at 05:09:45PM +0100, Christian Brauner wrote:
> > > On Fri, Mar 20, 2020 at 11:33:55AM -0700, Andrei Vagin wrote:
> > > > On Thu, Mar 19, 2020 at 11:29:55AM +0100, Christian Brauner wrote:
> > > > > On Thu, Mar 19, 2020 at 09:16:43AM +0100, Arnd Bergmann wrote:
> > > > > > On Thu, Mar 19, 2020 at 9:11 AM Adrian Reber <areber@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > > With Arnd's idea of only using nanoseconds, timens_offset would then
> > > > > > > contain something like this:
> > > > > > >
> > > > > > > struct timens_offset {
> > > > > > > __aligned_s64 monotonic_offset_ns;
> > > > > > > __aligned_s64 boottime_offset_ns;
> > > > > > > };
> > > > > > >
> > > > > > > I kind of prefer adding boottime and monotonic directly to struct clone_args
> > > > > > >
> > > > > > > __aligned_u64 tls;
> > > > > > > __aligned_u64 set_tid;
> > > > > > > __aligned_u64 set_tid_size;
> > > > > > > + __aligned_s64 monotonic_offset_ns;
> > > > > > > + __aligned_s64 boottime_offset_ns;
> > > > > > > };
> > > > > >
> > > > > > I would also prefer the second approach using two 64-bit integers
> > > > > > instead of a pointer, as it keeps the interface simpler to implement
> > > > > > and simpler to interpret by other tools.
> > > > >
> > > > > Why I don't like has two reasons. There's the scenario where we have
> > > > > added new extensions after the new boottime member and then we introduce
> > > > > another offset. Then you'd be looking at:
> > > > >
> > > > > __aligned_u64 tls;
> > > > > __aligned_u64 set_tid;
> > > > > __aligned_u64 set_tid_size;
> > > > > + __aligned_s64 monotonic_offset_ns;
> > > > > + __aligned_s64 boottime_offset_ns;
> > > > > __aligned_s64 something_1
> > > > > __aligned_s64 anything_2
> > > > > + __aligned_s64 sometime_offset_ns
> > > > >
> > > > > which bothers me just by looking at it. That's in addition to adding two
> > > > > new members to the struct when most people will never set CLONE_NEWTIME.
> > > > > We'll also likely have more features in the future that will want to
> > > > > pass down more info than we want to directly expose in struct
> > > > > clone_args, e.g. for a long time I have been thinking about adding a
> > > > > struct for CLONE_NEWUSER that allows you to specify the id mappings you
> > > > > want the new user namespace to get. We surely don't want to force all
> > > > > new info into the uppermost struct. So I'm not convinced we should here.
> > > >
> > > > I think here we can start thinking about a netlink-like interface.
> > >
> > > I think netlink is just not a great model for an API and I would not
> > > want us to go down that route.
> > >
> > > I kept thinking about this for a bit and I think that we will end up
> > > growing more namespace-related functionality. So one thing that came to
> > > my mind is the following layout:
> > >
> > > struct {
> > > struct {
> > > __s64 monotonic;
> > > __s64 boot;
> > > } time;
> > > } namespaces;
> > >
> > > struct _clone_args {
> > > __aligned_u64 flags;
> > > __aligned_u64 pidfd;
> > > __aligned_u64 child_tid;
> > > __aligned_u64 parent_tid;
> > > __aligned_u64 exit_signal;
> > > __aligned_u64 stack;
> > > __aligned_u64 stack_size;
> > > __aligned_u64 tls;
> > > __aligned_u64 set_tid;
> > > __aligned_u64 set_tid_size;
> > > __aligned_u64 namespaces;
> > > __aligned_u64 namespaces_size;
> > > };
> > >
> > > Then when we end up adding id mapping support for CLONE_NEWUSER we can
> > > extend this with:
> > >
> > > struct {
> > > struct {
> > > __aligned_u64 monotonic;
> > > __aligned_u64 boot;
>
> s/__aligned_u64/__s64/g
>
> Sorry, leftover from my first draft.
>
> > > } time;
> > >
> > > struct {
> > > /* id mapping members */
> > > } user;
> > > } namespaces;
> > >
> > > Thoughts? Other ideas?
> >
> > Works for me.
> >
> > If we add the user namespace id mappings and then at some point a third
> > element for the time namespace appears it would also start to be mixed.
> > Just as you mentioned that a few mails ago.
>
> I think you misunderstand me or I'm misunderstanding you. That new time
> namespace member would go into struct time {} so
>
> struct {
> struct {
> __s64 monotonic;
> __s64 boot;
> __s64 someothertime;
> } time;
>
> struct {
> /* id mapping members */
> } user;
> } namespaces;

My question was about how does the kernel know how 'struct namespaces'
is structured. How can an older kernel (which only is aware of two
clocks) deal with a, like in this example, third clock. Will the size
'__aligned_u64 namespaces_size' be used for versioning?

Adrian