Re: [PATCH v2 2.6.38-rc8-tip 5/20] 5: Uprobes: register/unregisterprobes.

From: Thomas Gleixner
Date: Tue Mar 15 2011 - 14:17:15 EST


On Tue, 15 Mar 2011, Srikar Dronamraju wrote:
> * Thomas Gleixner <tglx@xxxxxxxxxxxxx> [2011-03-15 15:28:04]:
> > > + list_for_each_entry_safe(mm, tmpmm, &tmp_list, uprobes_list) {
> > > + down_read(&mm->mmap_sem);
> > > + if (!install_uprobe(mm, uprobe))
> > > + ret = 0;
> >
> > Installing it once is success ?
>
> This is a little tricky. My intention was to return success even if one
> install is successful. If we return error, then the caller can go
> ahead and free the consumer. Since we return success if there are
> currently no processes that have mapped this inode, I was tempted to
> return success on atleast one successful install.

Ok. Wants to be documented in a comment.

> >
> > > + list_del(&mm->uprobes_list);
> >
> > Also the locking rules for mm->uprobes_list want to be
> > documented. They are completely non obvious.
> >
> > > + up_read(&mm->mmap_sem);
> > > + mmput(mm);
> > > + }
> > > +
> > > +consumers_add:
> > > + add_consumer(uprobe, consumer);
> > > + mutex_unlock(&uprobes_mutex);
> > > + put_uprobe(uprobe);
> >
> > Why do we drop the refcount here?
>
> The first time uprobe_add gets called for a unique inode:offset
> pair, it sets the refcount to 2 (One for the uprobe creation and the
> other for register activity). From next time onwards it
> increments the refcount by (for register activity) 1.
> The refcount dropped here corresponds to the register activity.
>
> Similarly unregister takes a refcount thro find_uprobe and drops it thro
> del_consumer(). However it drops the creation refcount if and if
> there are no more consumers.

Ok. That wants a few comments perhaps. It's not really obvious.

> I thought of just taking the refcount just for the first register and
> decrement for the last unregister. However register/unregister can race
> with each other causing the refcount to be zero and free the uprobe
> structure even though we were still registering the probe.

Right, that won't work.

> >
> > > + return ret;
> > > +}
> >
> > > + /*
> > > + * There could be other threads that could be spinning on
> > > + * treelock; some of these threads could be interested in this
> > > + * uprobe. Give these threads a chance to run.
> > > + */
> > > + synchronize_sched();
> >
> > This makes no sense at all. We are not holding treelock, we are about
> > to acquire it. Also what does it matter when they spin on treelock and
> > are interested in this uprobe. Either they find it before we remove it
> > or not. So why synchronize_sched()? I find the lifetime rules of
> > uprobe utterly confusing. Could you explain please ?
>
> There could be threads that have hit the breakpoint and are
> entering the notifier code(interrupt context) and then
> do_notify_resume(task context) and trying to acquire the treelock.
> (treelock is held by the breakpoint hit threads in
> uprobe_notify_resume which gets called in do_notify_resume()) The
> current thread that is removing the uprobe from the rb_tree can race
> with these threads and might acquire the treelock before some of the
> breakpoint hit threads. If this happens the interrupted threads have
> to re-read the opcode to see if the breakpoint location no more has the
> breakpoint instruction and retry the instruction. However before it can
> detect and retry, some other thread might insert a breakpoint at that
> location. This can go in a loop.

Ok, that makes sense, but you want to put a lenghty explanation into
the comment above the synchronize_sched() call.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/