Re: [PATCH 3/3] mm: Cause revoke_mappings to wait until all close methods have completed.

From: Eric W. Biederman
Date: Mon Sep 27 2010 - 04:04:59 EST


Pekka Enberg <penberg@xxxxxxxxxx> writes:

> On Sun, Sep 26, 2010 at 2:34 AM, Eric W. Biederman
> <ebiederm@xxxxxxxxxxxx> wrote:
>>
>> Signed-off-by: Eric W. Biederman <ebiederm@xxxxxxxxxxxxxxxxxx>
>
> The changelog is slightly too terse for me to review this. What's the
> race you're trying to avoid? If the point is to make the revoke task
> wait until everything has been closed, why can't we use the completion
> API here?

Because as far as I can tell completions don't map at all.
At best we have are times when the set of vmas goes empty.

The close_count is needed as we take vmas off the lists early
to avoid issues with truncate, and so we something to tell
when the close is actually finished. If a close is actually
in progress.

I used a simple task to wake up instead of a wait queue as in all of the
revoke scenarios I know of, it make sense to serialize at a higher
level, and a task pointer is smaller than a wait queue head, and
I am reluctant to increase the size of struct inode to larger than
necessary.

The count at least has to be always present because objects could start
closing before we start the revoke. We can't be ham handed and grab
all mm->mmap_sem's because mmput() revoke_vma is called without the
mmap_sem.

So it looked to me that the cleanest and smallest way to go was to write
an old fashioned schedule/wait loop that are usually hidden behind
completions, or wait queue logic.

It is my hope that we can be clever at some point and create a union
either in struct inode proper or in struct address space with a few
other fields that cannot be used while revoking a vma (like potentially
the truncate count) and reuse them. But I am not clever enough today
to do something like that.

Eric

>> ---
>> Âinclude/linux/fs.h | Â Â2 ++
>> Âmm/mmap.c     Â|  13 ++++++++++++-
>> Âmm/revoke.c    Â|  18 +++++++++++++++---
>> Â3 files changed, 29 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 76041b6..5d3d6b8 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -633,6 +633,8 @@ struct address_space {
>> Â Â Â Âconst struct address_space_operations *a_ops; Â /* methods */
>>    Âunsigned long      flags;     Â/* error bits/gfp mask */
>> Â Â Â Âstruct backing_dev_info *backing_dev_info; /* device readahead, etc */
>> +    struct task_struct   Â*revoke_task;  /* Who to wake up when all vmas are closed */
>> +    unsigned int      Âclose_count;  Â/* Cover race conditions with revoke_mappings */
>>    Âspinlock_t       Âprivate_lock;  /* for use by the address_space */
>>    Âstruct list_head    Âprivate_list;  /* ditto */
>>    Âstruct address_space  Â*assoc_mapping; /* ditto */
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 17dd003..3df3193 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -218,6 +218,7 @@ void unlink_file_vma(struct vm_area_struct *vma)
>> Â Â Â Â Â Â Â Âstruct address_space *mapping = file->f_mapping;
>> Â Â Â Â Â Â Â Âspin_lock(&mapping->i_mmap_lock);
>> Â Â Â Â Â Â Â Â__remove_shared_vm_struct(vma, file, mapping);
>> + Â Â Â Â Â Â Â mapping->close_count++;
>> Â Â Â Â Â Â Â Âspin_unlock(&mapping->i_mmap_lock);
>> Â Â Â Â}
>> Â}
>> @@ -233,9 +234,19 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
>> Â Â Â Âif (vma->vm_ops && vma->vm_ops->close)
>> Â Â Â Â Â Â Â Âvma->vm_ops->close(vma);
>> Â Â Â Âif (vma->vm_file) {
>> - Â Â Â Â Â Â Â fput(vma->vm_file);
>> + Â Â Â Â Â Â Â struct address_space *mapping = vma->vm_file->f_mapping;
>> Â Â Â Â Â Â Â Âif (vma->vm_flags & VM_EXECUTABLE)
>> Â Â Â Â Â Â Â Â Â Â Â Âremoved_exe_file_vma(vma->vm_mm);
>> +
>> + Â Â Â Â Â Â Â /* Decrement the close count and wake up a revoker if present */
>> + Â Â Â Â Â Â Â spin_lock(&mapping->i_mmap_lock);
>> + Â Â Â Â Â Â Â mapping->close_count--;
>> + Â Â Â Â Â Â Â if ((mapping->close_count == 0) && mapping->revoke_task)
>> + Â Â Â Â Â Â Â Â Â Â Â /* Is wake_up_process the right variant of try_to_wake_up? */
>> + Â Â Â Â Â Â Â Â Â Â Â wake_up_process(mapping->revoke_task);
>> + Â Â Â Â Â Â Â spin_unlock(&mapping->i_mmap_lock);
>> +
>> + Â Â Â Â Â Â Â fput(vma->vm_file);
>> Â Â Â Â}
>> Â Â Â Âmpol_put(vma_policy(vma));
>> Â Â Â Âkmem_cache_free(vm_area_cachep, vma);
>> diff --git a/mm/revoke.c b/mm/revoke.c
>> index a76cd1a..e19f7df 100644
>> --- a/mm/revoke.c
>> +++ b/mm/revoke.c
>> @@ -143,15 +143,17 @@ void revoke_mappings(struct address_space *mapping)
>> Â Â Â Â/* Make any access to previously mapped pages trigger a SIGBUS,
>> Â Â Â Â * and stop calling vm_ops methods.
>> Â Â Â Â *
>> - Â Â Â Â* When revoke_mappings returns invocations of vm_ops->close
>> - Â Â Â Â* may still be in progress, but no invocations of any other
>> - Â Â Â Â* vm_ops methods will be.
>> + Â Â Â Â* When revoke_mappings no invocations of any method will be
>> + Â Â Â Â* in progress.
>> Â Â Â Â */
>> Â Â Â Âstruct vm_area_struct *vma;
>> Â Â Â Âstruct prio_tree_iter iter;
>>
>> Â Â Â Âspin_lock(&mapping->i_mmap_lock);
>>
>> + Â Â Â WARN_ON(mapping->revoke_task);
>> + Â Â Â mapping->revoke_task = current;
>> +
>> Ârestart_tree:
>> Â Â Â Âvma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, ULONG_MAX) {
>> Â Â Â Â Â Â Â Âif (revoke_mapping(mapping, vma->vm_mm, vma->vm_start))
>> @@ -164,6 +166,16 @@ restart_list:
>> Â Â Â Â Â Â Â Â Â Â Â Âgoto restart_list;
>> Â Â Â Â}
>>
>> + Â Â Â while (!list_empty(&mapping->i_mmap_nonlinear) ||
>> + Â Â Â Â Â Â Â!prio_tree_empty(&mapping->i_mmap) ||
>> + Â Â Â Â Â Â Âmapping->close_count)
>> + Â Â Â {
>> + Â Â Â Â Â Â Â __set_current_state(TASK_UNINTERRUPTIBLE);
>> + Â Â Â Â Â Â Â spin_unlock(&mapping->i_mmap_lock);
>> + Â Â Â Â Â Â Â schedule();
>> + Â Â Â Â Â Â Â spin_lock(&mapping->i_mmap_lock);
>> + Â Â Â }
>> + Â Â Â mapping->revoke_task = NULL;
>> Â Â Â Âspin_unlock(&mapping->i_mmap_lock);
>> Â}
>> ÂEXPORT_SYMBOL_GPL(revoke_mappings);
>> --
>> 1.7.2.3
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@xxxxxxxxxx ÂFor more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/