Re: [PATCH v5 0/6] arm64: add the time namespace support

From: Andrei Vagin
Date: Thu Jul 23 2020 - 13:41:46 EST


On Wed, Jul 22, 2020 at 07:15:06PM +0100, Catalin Marinas wrote:
> On Mon, Jul 13, 2020 at 06:57:43PM -0700, Andrei Vagin wrote:
> > On Sat, Jul 04, 2020 at 11:40:55PM -0700, Andrei Vagin wrote:
> > > On Wed, Jun 24, 2020 at 01:33:15AM -0700, Andrei Vagin wrote:
> > > > Allocate the time namespace page among VVAR pages and add the logic
> > > > to handle faults on VVAR properly.
> > > >
> > > > If a task belongs to a time namespace then the VVAR page which contains
> > > > the system wide VDSO data is replaced with a namespace specific page
> > > > which has the same layout as the VVAR page. That page has vdso_data->seq
> > > > set to 1 to enforce the slow path and vdso_data->clock_mode set to
> > > > VCLOCK_TIMENS to enforce the time namespace handling path.
> > > >
> > > > The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> > > > update of the VDSO data is in progress, is not really affecting regular
> > > > tasks which are not part of a time namespace as the task is spin waiting
> > > > for the update to finish and vdso_data->seq to become even again.
> > > >
> > > > If a time namespace task hits that code path, it invokes the corresponding
> > > > time getter function which retrieves the real VVAR page, reads host time
> > > > and then adds the offset for the requested clock which is stored in the
> > > > special VVAR page.
> > > >
> > >
> > > > v2: Code cleanups suggested by Vincenzo.
> > > > v3: add a comment in __arch_get_timens_vdso_data.
> > > > v4: - fix an issue reported by the lkp robot.
> > > > - vvar has the same size with/without CONFIG_TIME_NAMESPACE, but the
> > > > timens page isn't allocated on !CONFIG_TIME_NAMESPACE. This
> > > > simplifies criu/vdso migration between different kernel configs.
> > > > v5: - Code cleanups suggested by Mark Rutland.
> > > > - In vdso_join_timens, mmap_write_lock is downgraded to
> > > > mmap_read_lock. The VMA list isn't changed there, zap_page_range
> > > > doesn't require mmap_write_lock.
> > > >
> > > > Reviewed-by: Vincenzo Frascino <vincenzo.frascino@xxxxxxx>
> > > > Reviewed-by: Dmitry Safonov <dima@xxxxxxxxxx>
> > >
> > > Hello Will and Catalin,
> > >
> > > Have you had a chance to look at this patch set? I think it is ready to be
> > > merged. Let me know if you have any questions.
> >
> > *friendly ping*
> >
> > If I am doing something wrong, let me know.
>
> Not really, just haven't got around to looking into it. Mark Rutland
> raised a concern (in private) about the safety of multithreaded apps
> but I think you already replied that timens_install() checks for this
> already [1].
>
> Maybe a similar atomicity issue to the one raised by Mark but for
> single-threaded processes: the thread is executing vdso code, gets
> interrupted and a signal handler invokes setns(). Would resuming the
> execution in the vdso code on sigreturn cause any issues?

It will not cause any issues in the kernel. In the userspace,
clock_gettime() can return a clock value with an inconsistent offset, if
a process switches between two non-root namespaces. And it can triggers
SIGSEGV if it switches from a non-root to the root time namespace,
because a time namespace isn't mapped in the root time namespace.

I don't think that we need to handle this case in the kernel. Users
must understand what they are doing and have to write code so that avoid
these sort of situations. In general, I would say that in most cases it
is a bad idea to call setns from a signal handler.

Thanks,
Andrei

>
> [1] https://lore.kernel.org/linux-arm-kernel/d16b5cd1-bdb1-5667-fbda-c622cc795389@xxxxxxxxxx/
>
> --
> Catalin