Re: [PATCH v5 3.1.0-rc4-tip 4/26] uprobes: Define hooks formmap/munmap.

From: Oleg Nesterov
Date: Mon Oct 03 2011 - 09:42:27 EST


On 09/20, Srikar Dronamraju wrote:
>
> @@ -739,6 +740,10 @@ struct mm_struct *dup_mm(struct task_struct *tsk)
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> mm->pmd_huge_pte = NULL;
> #endif
> +#ifdef CONFIG_UPROBES
> + atomic_set(&mm->mm_uprobes_count,
> + atomic_read(&oldmm->mm_uprobes_count));

Hmm. Why this can't race with install_breakpoint/remove_breakpoint
between _read and _set ?

What about VM_DONTCOPY vma's with breakpoints ?

> -static int match_uprobe(struct uprobe *l, struct uprobe *r)
> +static int match_uprobe(struct uprobe *l, struct uprobe *r, int *match_inode)
> {
> + /*
> + * if match_inode is non NULL then indicate if the
> + * inode atleast match.
> + */
> + if (match_inode)
> + *match_inode = 0;
> +
> if (l->inode < r->inode)
> return -1;
> if (l->inode > r->inode)
> return 1;
> else {
> + if (match_inode)
> + *match_inode = 1;
> +

It is very possible I missed something, but imho this looks confusing.

This close_match logic is only needed for build_probe_list() and
dec_mm_uprobes_count(), and both do not actually need the returned
uprobe.

Instead of complicating match_uprobe() and __find_uprobe(), perhaps
it makes sense to add "struct rb_node *__find_close_rb_node(inode)" ?

> +static int install_breakpoint(struct mm_struct *mm, struct uprobe *uprobe)
> {
> /* Placeholder: Yet to be implemented */
> + if (!uprobe->consumers)
> + return 0;

How it is possible to see ->consumers == NULL?

OK, afaics it _is_ possible, but only because unregister does del_consumer()
without ->i_mutex, but this is bug afaics (see the previous email).

Another user is mmap_uprobe() and it checks ->consumers != NULL itself (but
see below).

> +int mmap_uprobe(struct vm_area_struct *vma)
> +{
> + struct list_head tmp_list;
> + struct uprobe *uprobe, *u;
> + struct inode *inode;
> + int ret = 0;
> +
> + if (!valid_vma(vma))
> + return ret; /* Bail-out */
> +
> + inode = igrab(vma->vm_file->f_mapping->host);
> + if (!inode)
> + return ret;
> +
> + INIT_LIST_HEAD(&tmp_list);
> + mutex_lock(&uprobes_mmap_mutex);
> + build_probe_list(inode, &tmp_list);
> + list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
> + loff_t vaddr;
> +
> + list_del(&uprobe->pending_list);
> + if (!ret && uprobe->consumers) {
> + vaddr = vma->vm_start + uprobe->offset;
> + vaddr -= vma->vm_pgoff << PAGE_SHIFT;
> + if (vaddr < vma->vm_start || vaddr >= vma->vm_end)
> + continue;
> + ret = install_breakpoint(vma->vm_mm, uprobe);

So. We are adding the new mapping, we should find all breakpoints this
file has in the start/end range.

We are holding ->mmap_sem... this seems enough to protect against the
races with register/unregister. Except, what if __register_uprobe()
fails? In this case __unregister_uprobe() does delete_uprobe() at the
very end. What if mmap mmap_uprobe() is called right before delete_?

> +static void dec_mm_uprobes_count(struct vm_area_struct *vma,
> + struct inode *inode)
> +{
> + struct uprobe *uprobe;
> + struct rb_node *n;
> + unsigned long flags;
> +
> + n = uprobes_tree.rb_node;
> + spin_lock_irqsave(&uprobes_treelock, flags);
> + uprobe = __find_uprobe(inode, 0, &n);
> +
> + /*
> + * If indeed there is a probe for the inode and with offset zero,
> + * then lets release its reference. (ref got thro __find_uprobe)
> + */
> + if (uprobe)
> + put_uprobe(uprobe);
> + for (; n; n = rb_next(n)) {
> + loff_t vaddr;
> +
> + uprobe = rb_entry(n, struct uprobe, rb_node);
> + if (uprobe->inode != inode)
> + break;
> + vaddr = vma->vm_start + uprobe->offset;
> + vaddr -= vma->vm_pgoff << PAGE_SHIFT;
> + if (vaddr < vma->vm_start || vaddr >= vma->vm_end)
> + continue;
> + atomic_dec(&vma->vm_mm->mm_uprobes_count);

So, this does atomic_dec() for each bp in this vma?

And the caller is

> @@ -1337,6 +1338,9 @@ unsigned long unmap_vmas(struct mmu_gather *tlb,
> if (unlikely(is_pfn_mapping(vma)))
> untrack_pfn_vma(vma, 0, 0);
>
> + if (vma->vm_file)
> + munmap_uprobe(vma);

Doesn't look right...

munmap_uprobe() assumes that the whole region goes away. This is
true in munmap() case afaics, it does __split_vma() if necessary.

But what about truncate() ? In this case this vma is not unmapped,
but unmap_vmas() is called anyway and [start, end) can be different.
IOW, unless I missed something (this is very possible) we can do
more atomic_dec's then needed.

Also, truncate() obviously changes ->i_size. Doesn't this mean
unregister_uprobe() should return if offset > i_size ? We need to
free uprobes anyway.

MADV_DONTNEED? It calls unmap_vmas() too. And application can do
madvise(DONTNEED) in a loop.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/