Re: [PATCH 01/23] mm: Introduce revoke_file_mappings.

From: Eric W. Biederman
Date: Mon Jun 01 2009 - 20:12:40 EST


Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> writes:

> On Mon, 1 Jun 2009 14:50:26 -0700
> "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> wrote:
>
>> +static void revoke_vma(struct vm_area_struct *vma)
>
> This looks odd.
>
>> +{
>> + struct file *file = vma->vm_file;
>> + struct address_space *mapping = file->f_mapping;
>> + unsigned long start_addr, end_addr, size;
>> + struct mm_struct *mm;
>> +
>> + start_addr = vma->vm_start;
>> + end_addr = vma->vm_end;
>
> We take a copy of start_addr/end_addr (and this end_addr value is never used)
A foolish consistency.

>> + /* Switch out the locks so I can maninuplate this under the mm sem.
>> + * Needed so I can call vm_ops->close.
>> + */
>> + mm = vma->vm_mm;
>> + atomic_inc(&mm->mm_users);
>> + spin_unlock(&mapping->i_mmap_lock);
>> +
>> + /* Block page faults and other code modifying the mm. */
>> + down_write(&mm->mmap_sem);
>> +
>> + /* Lookup a vma for my file address */
>> + vma = find_vma(mm, start_addr);
>
> Then we look up a vma. Is there reason to believe that this will
> differ from the incoming arg which we just overwrote? Maybe the code
> is attempting to handle racing concurrent mmap/munmap activity? If so,
> what are the implications of this?

Yes it is. The file based index is only safe while we hold the i_mmap_lock.
The manipulation that needs to happen under the mmap_sem.

So I drop all of the locks and restart. And use the time honored
kernel practice of relooking up the thing I am going to manipulate.
As long as it is for the same file I don't care.

> I _think_ that what the function is attempting to do is "unmap the vma
> which covers the address at vma->start_addr". If so, why not just pass
> it that virtual address?

Actually it is unmapping a vma for the file I am revoking. I hand it
one and then it does an address space jig.

> Anyway, it's all a bit obscure and I do think that the semantics and
> behaviour should be carefully explained in a comment, no?
>
>> + if (vma->vm_file != file)
>> + goto out;
>
> This strengthens the theory that some sort of race-management is
> happening here.

Totally. I dropped all of my locks so I am having to restart in
a different locking context.

>> + start_addr = vma->vm_start;
>> + end_addr = vma->vm_end;
>> + size = end_addr - start_addr;
>> +
>> + /* Unlock the pages */
>> + if (mm->locked_vm && (vma->vm_flags & VM_LOCKED)) {
>> + mm->locked_vm -= vma_pages(vma);
>> + vma->vm_flags &= ~VM_LOCKED;
>> + }
>> +
>> + /* Unmap the vma */
>> + zap_page_range(vma, start_addr, size, NULL);
>> +
>> + /* Unlink the vma from the file */
>> + unlink_file_vma(vma);
>> +
>> + /* Close the vma */
>> + if (vma->vm_ops && vma->vm_ops->close)
>> + vma->vm_ops->close(vma);
>> + fput(vma->vm_file);
>> + vma->vm_file = NULL;
>> + if (vma->vm_flags & VM_EXECUTABLE)
>> + removed_exe_file_vma(vma->vm_mm);
>> +
>> + /* Repurpose the vma */
>> + vma->vm_private_data = NULL;
>> + vma->vm_ops = &revoked_vm_ops;
>> + vma->vm_flags &= ~(VM_NONLINEAR | VM_CAN_NONLINEAR);
>> +out:
>> + up_write(&mm->mmap_sem);
>> + spin_lock(&mapping->i_mmap_lock);
>> +}
>
> Also, I'm not a bit fan of the practice of overwriting the value of a
> formal argument, especially in a function which is this large and
> complex. It makes the code harder to follow, because the one variable
> holds two conceptually different things within the span of the same
> function. And it adds risk that someone will will later access a field
> of *vma and it will be the wrong vma. Worse, the bug is only exposed
> under exeedingly rare conditions.
>
> So.. Use a new local, please.

We can never legitimately have more than one vma manipulated in this function.
As for the rest. I guess I just assumed that the reader of the code would
have a basic understanding of the locking rules for those data structures.

Certainly the worst thing I suffer from is being close to the code,
and not realizing which pieces are not obvious to a naive observer.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/