Re: [PATCH 07/12] x86/virt/guest/xen: Remove use of pgd_list from the Xen guest code

From: Ingo Molnar
Date: Fri Jun 12 2015 - 04:04:43 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Jun 12, 2015 00:23, "Ingo Molnar" <mingo@xxxxxxxxxx> wrote:
> >
> > We might make it so: but that would mean restricting certain clone_flags
> > variants - not sure that's possible with our current ABI usage?
>
> We already do that. You can't share signal info unless you share the mm. And a
> shared signal state is what defines a thread group.
>
> So I think the only issue is that ->mm can become NULL when the thread group
> leader dies - a non-NULL mm should always be shared among all threads.

Indeed, we do that in exit_mm().

So we could add tsk->mm_leader or so, which does not get cleared and which the
scheduler does not look at, but I'm not sure it's entirely safe that way: we don't
have a refcount, and when the last thread exits it becomes bogus for a small
window until the zombie leader is unlinked from the task list.

To close that race we'd have __mmdrop() or so clear out tsk->mm_leader - but the
task doing the mmdrop() might be a lazy thread totally unrelated to the original
thread group so we don't know which tsk->mm_leader to clear out.

To solve that we'd have to track the leader owning an MM in mm_struct - which gets
interesting for the exec() case where the thread group gets a new leader, so we'd
have to re-link the mm's leader pointer there.

So unless I missed some simpler solution there a good number of steps where this
could go wrong, in small looking race windows - how about we just live with
iterating through all tasks instead of just all processes, once per 512 GB of
memory mapped?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/