Re: [RFC PATCH 04/11] mm/mempolicy: modify get_mempolicy call stack to take a task argument

From: Michal Hocko
Date: Tue Nov 28 2023 - 09:07:34 EST


On Wed 22-11-23 16:11:53, Gregory Price wrote:
[...]
> @@ -928,7 +929,16 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
> * vma/shared policy at addr is NULL. We
> * want to return MPOL_DEFAULT in this case.
> */
> - mm = current->mm;
> + if (task == current) {
> + /*
> + * original behavior allows a kernel task changing its
> + * own policy to avoid the condition in get_task_mm,
> + * so we'll directly access
> + */
> + mm = task->mm;
> + mmget(mm);

Do we actually have any kernel thread that would call this? Does it
actually make sense to support?

> + } else
> + mm = get_task_mm(task);
> mmap_read_lock(mm);
> vma = vma_lookup(mm, addr);
> if (!vma) {
> @@ -947,8 +957,10 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
> return -EINVAL;
> else {
> /* take a reference of the task policy now */
> - pol = current->mempolicy;
> + task_lock(task);
> + pol = task->mempolicy;
> mpol_get(pol);
> + task_unlock(task);
> }
>
> if (!pol) {
> @@ -962,12 +974,13 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
> vma = NULL;
> mmap_read_unlock(mm);
> err = lookup_node(mm, addr);
> + mmput(mm);
> if (err < 0)
> goto out;
> *policy = err;
> - } else if (pol == current->mempolicy &&
> + } else if (pol == task->mempolicy &&
> pol->mode == MPOL_INTERLEAVE) {
> - *policy = next_node_in(current->il_prev, pol->nodes);
> + *policy = next_node_in(task->il_prev, pol->nodes);

This is racy without task_lock which I do not think is helde but it also
seems this is not a big deal. pol is ref. counted so it won't go away
and if the task->mempolicy changes then the return value could be bogus
but this seems acceptable. It would be good to put a comment here that
this is actually deliberate.

--
Michal Hocko
SUSE Labs