Re: clone3: allow creation of time namespace with offset

From: Aleksa Sarai
Date: Thu Mar 19 2020 - 04:11:15 EST


On 2020-03-17, Michael Kerrisk (man-pages) <mtk.manpages@xxxxxxxxx> wrote:
> [CC += linux-api; please CC on future versions]
>
> On Tue, 17 Mar 2020 at 09:32, Adrian Reber <areber@xxxxxxxxxx> wrote:
> > Requiring nanoseconds as well as seconds for two clocks during clone3()
> > means that it would require 4 additional members to 'struct clone_args':
> >
> > __aligned_u64 tls;
> > __aligned_u64 set_tid;
> > __aligned_u64 set_tid_size;
> > + __aligned_u64 boottime_offset_seconds;
> > + __aligned_u64 boottime_offset_nanoseconds;
> > + __aligned_u64 monotonic_offset_seconds;
> > + __aligned_u64 monotonic_offset_nanoseconds;
> > };
> >
> > To avoid four additional members to 'struct clone_args' this patchset
> > uses another approach:
> >
> > __aligned_u64 tls;
> > __aligned_u64 set_tid;
> > __aligned_u64 set_tid_size;
> > + __aligned_u64 timens_offset;
> > + __aligned_u64 timens_offset_size;
> > };
> >
> > timens_offset is a pointer to an array just as previously done with
> > set_tid and timens_offset_size is the size of the array.
> >
> > The timens_offset array is expected to contain a struct like this:
> >
> > struct set_timens_offset {
> > int clockid;
> > struct timespec val;
> > };
> >
> > This way it is possible to pass the information of multiple clocks with
> > seconds and nanonseconds to clone3().
> >
> > To me this seems the better approach, but I am not totally convinced
> > that it is the right thing. If there are other ideas how to pass two
> > clock offsets with seconds and nanonseconds to clone3() I would be happy
> > to hear other ideas.

While I agree this does make the API cleaner, I am a little worried that
it risks killing some of the ideas we discussed for seccomp deep
inspection. In particular, having a pointer to variable-sized data
inside the struct means that now the cBPF program can't just be given a
copy of the struct data from userspace to check.

I'm sure it's a solveable problem (and it was one we were bound to run
into at some point), it'll just mean we'll need a more complicated way
of filtering such syscalls.

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Attachment: signature.asc
Description: PGP signature