Re: [PATCH 03/14] mm/hmm: HMM should have a callback before MM is destroyed v2

From: John Hubbard
Date: Fri Mar 16 2018 - 22:36:55 EST


On 03/16/2018 12:14 PM, jglisse@xxxxxxxxxx wrote:
> From: Ralph Campbell <rcampbell@xxxxxxxxxx>
>

<snip>

> +static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
> +{
> + struct hmm *hmm = mm->hmm;
> + struct hmm_mirror *mirror;
> + struct hmm_mirror *mirror_next;
> +
> + down_write(&hmm->mirrors_sem);
> + list_for_each_entry_safe(mirror, mirror_next, &hmm->mirrors, list) {
> + list_del_init(&mirror->list);
> + if (mirror->ops->release)
> + mirror->ops->release(mirror);
> + }
> + up_write(&hmm->mirrors_sem);
> +}
> +

OK, as for actual code review:

This part of the locking looks good. However, I think it can race against
hmm_mirror_register(), because hmm_mirror_register() will just add a new
mirror regardless.

So:

thread 1 thread 2
-------------- -----------------
hmm_release hmm_mirror_register
down_write(&hmm->mirrors_sem); <blocked: waiting for sem>
// deletes all list items
up_write
unblocked: adds new mirror


...so I think we need a way to back out of any pending hmm_mirror_register()
calls, as part of the .release steps, right? It seems hard for the device driver,
which could be inside of hmm_mirror_register(), to handle that. Especially considering
that right now, hmm_mirror_register() will return success in this case--so
there is no indication that anything is wrong.

Maybe hmm_mirror_register() could return an error (and not add to the mirror list),
in such a situation, how's that sound?

thanks,
--
John Hubbard
NVIDIA