Re: cn_queue.c

From: Evgeniy Polyakov
Date: Fri Apr 01 2005 - 03:35:28 EST


On Thu, 2005-03-31 at 23:57 -0800, Andrew Morton wrote:
> Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:
> >
> > > > static void cn_queue_wrapper(void *data)
> > > > {
> > > > struct cn_callback_entry *cbq = (struct cn_callback_entry *)data;
> > > >
> > > > smp_mb__before_atomic_inc();
> > > > atomic_inc(&cbq->cb->refcnt);
> > > > smp_mb__after_atomic_inc();
> > > > cbq->cb->callback(cbq->cb->priv);
> > > > smp_mb__before_atomic_inc();
> > > > atomic_dec(&cbq->cb->refcnt);
> > > > smp_mb__after_atomic_inc();
> > > >
> > > > cbq->destruct_data(cbq->ddata);
> > > > }
> > >
> > > What on earth are all these barriers for? Barriers should have an
> > > associated comment describing why they were added, and what they are
> > > defending against, becuase it is very hard to tell that from reading the
> > > code.
> > >
> > > Please describe (via code comments) what the refcounting rules are for
> > > these data structures. It all looks very strange. You basically have:
> > >
> > >
> > > atomic_inc(refcnt);
> > > use(some_object);
> > > atomic_dec(refcnt);
> > >
> > > Which looks very racy. Some other CPU could see a zero refcount just
> > > before this CPU did the atomic_inc() and the other CPU will go and free up
> > > some_object. There should be locking associated with refcounting to
> > > prevent such races, and I don't see any.
> >
> > It can not happen in the above case.
> > It can race with callback removing, but remove path
> > call cancel_delayed_work() which calls del_timer_sync(),
> > so above code can not be run or will not run,
> > only after it refcnt is being checked to be zero.
> > It is probably an overhead since sinchronization is done
> > in other places.
> > All above barriers are introduced to remove already not theoretical
> > atomic vs. non-atomic reordering [ppc64 very like it],
> > when reference counter must be decremented after object was used,
> > but due to reordering it can happen before use [and it will be seen
> > atomically on all CPUs],
> > and other CPU may free object based on knowledge that
> > refcnt is zero.
>
> But all the above seems to be tied into the
> poll-until-refcount-goes-to-zero cleanup code. I think - it's quite
> unclear what the refcount protocol is for these objects.
>
> Why cannot we manage object lifetimes here in the same manner as dentries,
> inodes, skbuffs, etc?

It was easier to implement it in a that way.
Differencies will be only in function names - instead of msleep() it
will
be sleep_on.
Since it is atomic refcnt that is being checked wait_event can not be
used there directly.
I can change it's logic, it is quite simple.
Is it ok to postpone it until documentation changes are made in the
source code?

> > > > static struct cn_callback_entry *cn_queue_alloc_callback_entry(struct cn_callback *cb)
> > >
> > > 80 cols again.
> > >
> > > > static void cn_queue_free_callback(struct cn_callback_entry *cbq)
> > > > {
> > > > cancel_delayed_work(&cbq->work);
> > > >
> > > > while (atomic_read(&cbq->cb->refcnt)) {
> > > > printk(KERN_INFO "Waiting for %s to become free: refcnt=%d.\n",
> > > > cbq->pdev->name, atomic_read(&cbq->cb->refcnt));
> > > >
> > > > msleep_interruptible(1000);
> > > >
> > > > if (current->flags & PF_FREEZE)
> > > > refrigerator(PF_FREEZE);
> > > >
> > > > if (signal_pending(current))
> > > > flush_signals(current);
> > > > }
> > > >
> > > > kfree(cbq);
> > > > }
> > >
> > > erp. Can you please do this properly?
> > >
> > > Free the object on the refcount going to zero (with appropriate locking)?
> > > If you want (for some reason) to wait until all users of an object have
> > > gone away then you should take a ref on the object (inside locking), then
> > > sleep on a waitqueue_head embedded in that object. Make all
> > > refcount-droppers do a wake_up.
> > >
> > > Why is this function playing with signals?
> >
> > It is not, just can be interrupted to check state befor timeout expires.
> > It is not a problem to remove signal interrupting here.
>
> If this function is called by userspace tasks then we've just dumped any
> signals which that task might have had pending.

It is only rmmod/insmod who can call it
[it is called from module unloading/error path].
Implemented just to check state when Ctrl+C is pressed.

> > > > break;
> > > > }
> > > > }
> > > > if (!found) {
> > > > atomic_set(&cbq->cb->refcnt, 1);
> > > > list_add_tail(&cbq->callback_entry, &dev->queue_list);
> > > > }
> > > > spin_unlock_bh(&dev->queue_lock);
> > >
> > > Why use spin_lock_bh()? Does none of this code ever get called from IRQ
> > > context?
> >
> > Test documentation module that lives in
> > Documentation/connector/cn_test.c
> > describes different usage cases for connector, and it uses
> > cn_netlink_send()
> > from irq context, which may race with the callback adding/removing.
>
> But cn_netlink_send() takes ->queue_lock. If this CPU happened to be
> holding the same lock in process or bh context it will deadlock in IRQ
> context.

I need to state that sending must be limited to BH context too.
I thought about it when creating locking schema and decided to
not allow hard irq context, since receiving is already limited to
BH context.
I definitely need to put it into documentation.

> >
> > >
> > > > struct cn_queue_dev *cn_queue_alloc_dev(char *name, struct sock *nls)
> > > > {
> > > > struct cn_queue_dev *dev;
> > > >
> > > > dev = kmalloc(sizeof(*dev), GFP_KERNEL);
> > > > if (!dev) {
> > > > printk(KERN_ERR "%s: Failed to allocte new struct cn_queue_dev.\n",
> > > > name);
> > > > return NULL;
> > > > }
> > > >
> > > > memset(dev, 0, sizeof(*dev));
> > > >
> > > > snprintf(dev->name, sizeof(dev->name), "%s", name);
> > >
> > > scnprintf?
> >
> > Return value is ignored here, but I will change function though.
>
> scnprintf() null-terminates the destination even if it hits the end of its
> buffer.

Hmm...
snprintf() differs from scnprintf() only by returning value:

int snprintf(char * buf, size_t size, const char *fmt, ...)
{
va_list args;
int i;

va_start(args, fmt);
i=vsnprintf(buf,size,fmt,args);
va_end(args);
return i;
}

int scnprintf(char * buf, size_t size, const char *fmt, ...)
{
va_list args;
int i;

va_start(args, fmt);
i = vsnprintf(buf, size, fmt, args);
va_end(args);
return (i >= size) ? (size - 1) : i;
}

> > >
> > > Again, why is this code playing with signals? Hopefully it is never called
> > > by user tasks - that would be very bad.
> >
> > I will remove signal interrupting.
>
> And use the kthread API, please.

Yep.

> > According to wait queue inside the object - it can have it, but it will
> > be used only for waiting and repeated checking in a slow path.
> > Let's postpone it for now until other isssues are resolved first.
>
> This all ties into the refcounting and object lifetime design. What you
> have there appears quite unconventional. Maybe there's good reason for
> that and maybe the code is race-free. I don't understand what you've done
> there sufficiently well to be able to say.

New object has 0 reference counter when created.
If some work is appointed to the object, then it's counter is atomically
incremented. It is decremented when the work is finished.
If object is supposed to be removed while some work
may be appointed to it, core ensures that no work _is_ appointed,
and atomically disallows[for example removing workqueue, removing
callback, all with appropriate locks being hold]
any other work appointment for the given object.
After it [when no work can be appointed to the object] if object
still has pending work [and thus has it's refcounter not zero],
removing path waits untill appropriate refcnt hits zero.
Since no _new_ work can be appointed at that level it is just
while (atomic_read(refcnt) != 0)
msleep();


--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

Attachment: signature.asc
Description: This is a digitally signed message part