Re: [PATCH v2 5/6] mm, cpuset: always use seqlock when changing task's nodemask

From: Michal Hocko
Date: Tue May 23 2017 - 03:15:30 EST


On Wed 17-05-17 10:11:39, Vlastimil Babka wrote:
> When updating task's mems_allowed and rebinding its mempolicy due to cpuset's
> mems being changed, we currently only take the seqlock for writing when either
> the task has a mempolicy, or the new mems has no intersection with the old
> mems. This should be enough to prevent a parallel allocation seeing no
> available nodes, but the optimization is IMHO unnecessary (cpuset updates
> should not be frequent), and we still potentially risk issues if the
> intersection of new and old nodes has limited amount of free/reclaimable
> memory. Let's just use the seqlock for all tasks.

Agreed

> Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>

Acked-by: Michal Hocko <mhocko@xxxxxxxx>

> ---
> kernel/cgroup/cpuset.c | 29 ++++++++---------------------
> 1 file changed, 8 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index dfd5b420452d..26a1c360a481 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1038,38 +1038,25 @@ static void cpuset_post_attach(void)
> * @tsk: the task to change
> * @newmems: new nodes that the task will be set
> *
> - * In order to avoid seeing no nodes if the old and new nodes are disjoint,
> - * we structure updates as setting all new allowed nodes, then clearing newly
> - * disallowed ones.
> + * We use the mems_allowed_seq seqlock to safely update both tsk->mems_allowed
> + * and rebind an eventual tasks' mempolicy. If the task is allocating in
> + * parallel, it might temporarily see an empty intersection, which results in
> + * a seqlock check and retry before OOM or allocation failure.
> */
> static void cpuset_change_task_nodemask(struct task_struct *tsk,
> nodemask_t *newmems)
> {
> - bool need_loop;
> -
> task_lock(tsk);
> - /*
> - * Determine if a loop is necessary if another thread is doing
> - * read_mems_allowed_begin(). If at least one node remains unchanged and
> - * tsk does not have a mempolicy, then an empty nodemask will not be
> - * possible when mems_allowed is larger than a word.
> - */
> - need_loop = task_has_mempolicy(tsk) ||
> - !nodes_intersects(*newmems, tsk->mems_allowed);
>
> - if (need_loop) {
> - local_irq_disable();
> - write_seqcount_begin(&tsk->mems_allowed_seq);
> - }
> + local_irq_disable();
> + write_seqcount_begin(&tsk->mems_allowed_seq);
>
> nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);
> mpol_rebind_task(tsk, newmems);
> tsk->mems_allowed = *newmems;
>
> - if (need_loop) {
> - write_seqcount_end(&tsk->mems_allowed_seq);
> - local_irq_enable();
> - }
> + write_seqcount_end(&tsk->mems_allowed_seq);
> + local_irq_enable();
>
> task_unlock(tsk);
> }
> --
> 2.12.2

--
Michal Hocko
SUSE Labs