Re: [PATCH 2/4] arch/x86: implement the process_vm_exec syscall

From: Jann Horn
Date: Fri Jul 02 2021 - 07:51:39 EST


On Fri, Jul 2, 2021 at 8:25 AM Andrei Vagin <avagin@xxxxxxxxx> wrote:
> On Mon, Jun 28, 2021 at 06:13:29PM +0200, Jann Horn wrote:
> > On Wed, Apr 14, 2021 at 7:59 AM Andrei Vagin <avagin@xxxxxxxxx> wrote:
> > > +static void swap_mm(struct mm_struct *prev_mm, struct mm_struct *target_mm)
> > > +{
> > > + struct task_struct *tsk = current;
> > > + struct mm_struct *active_mm;
> > > +
> > > + task_lock(tsk);
> > > + /* Hold off tlb flush IPIs while switching mm's */
> > > + local_irq_disable();
> > > +
> > > + sync_mm_rss(prev_mm);
> > > +
> > > + vmacache_flush(tsk);
> > > +
> > > + active_mm = tsk->active_mm;
> > > + if (active_mm != target_mm) {
> > > + mmgrab(target_mm);
> > > + tsk->active_mm = target_mm;
> > > + }
> > > + tsk->mm = target_mm;
> >
> > I'm pretty sure you're not currently allowed to overwrite the ->mm
> > pointer of a userspace thread. For example, zap_threads() assumes that
> > all threads running under a process have the same ->mm. (And if you're
> > fiddling with ->mm stuff, you should probably CC linux-mm@.)
> >
> > As far as I understand, only kthreads are allowed to do this (as
> > implemented in kthread_use_mm()).
>
> kthread_use_mm() was renamed from use_mm in the v5.8 kernel. Before
> that, it wasn't used for user processes in the kernel, but it was
> exported for modules, and we used it without any visible problems. We
> understood that there could be some issues like zap_threads and it was
> one of reasons why we decided to introduce this system call.
>
> I understand that there are no places in the kernel where we change mm
> of user threads back and forth, but are there any real concerns why we
> should not do that? I agree that zap_threads should be fixed, but it
> will the easy one.

My point is that if you break a preexisting assumption like this,
you'll have to go through the kernel and search for places that rely
on this assumption, and fix them up, which may potentially require
thinking about what kinds of semantics would actually be appropriate
there. Like the MCE killing logic (collect_procs_anon() and such). And
current_is_single_threaded(), in which the current patch probably
leads to logic security bugs. And __uprobe_perf_filter(). Before my
refactoring of the ELF coredump logic in kernel 5.10 (commit
b2767d97f5ff75 and the ones before it), you'd have also probably
created memory corruption bugs in races between elf_core_dump() and
syscalls like mmap()/munmap(). (Note that this is not necessarily an
exhaustive list.)