Re: Mysterious CFQ crash and RCU

From: Vivek Goyal
Date: Mon Jun 06 2011 - 10:28:16 EST


On Sun, Jun 05, 2011 at 08:56:33AM +0200, Jens Axboe wrote:
> On 2011-06-05 00:48, Paul Bolle wrote:
> > I think I finally found it!
> >
> > The culprit seems to be io_context.ioc_data (not the most clear of
> > names!). It seems to be a single entry "last-hit cache" of an hlist
> > called cic_list. (There are three, subtly different, cic_lists in the
> > CFQ code!) It is not entirely clear, but that last-hit cache can get out
> > of sync with the hlist it is supposed to cache. My guess it that every
> > now and then a member of the hlist gets deleted while it's still in that
> > (single entry) cache. If it then gets retrieved from that cache it
> > already points to poisoned memory. For some strange reason this only
> > results in an Oops if one or more debugging options are set (as are set
> > in the Fedora Rawhide non-stable kernels that I ran into this). I have
> > no clue whatsoever, why that is ...
> >
> > Anyhow, after ripping out ioc_data this bug seems to have disappeared!
> > Jens, Vivek, could you please have a look at this? In the mean time I
> > hope to pinpoint this issue and draft a small patch to really solve it
> > (ie, not by simply ripping out ioc_data).
>
> Does this fix it? It will introduce a hierarchy that is queue -> ioc
> lock, but as far as I can remember (and tell from a quick look), we
> don't have any dependencies on that order of locking at this moment. So
> should be OK.

Excellent debugging PaulB. This bug was troubling us for a very long
time now.

While we are looking at races, I am wondering if we still have more races
in process exit vs queue exit path. That's a different thing that it is
not easy to hit the races.

So process exit path we do following.

cfq_exit_single_io_context {
cfqd = cic_to_cfqd();
q = cfqd->queue; <----- Racy
spin_lock_irqsave(q->queue_lock, flags); <--- Racy
}

And in queue exit path we do following.

cfq_exit_queue()
{
spin_lock_irq(q->queue_lock);
__cfq_exit_single_io_context {
cic->key = cfqd_dead_key(cfqd);
}
spin_unlock_irq(q->queue_lock);
kfree(cfqd);
}

In process exit path, we retrieve cfqd without any locks (Except rcu read
lock), and it might race with queue exit path. Resulting in the fact that
by the time we fetch queue pointer from cfqd, cfqd might have already been
freed.

I think issue here seems to be that we fetch cfqd = cic->key under rcu
but free up cfqd without waiting for rcu grace period.

If we introduce an unconditional synchronize_rcu() in cfq_exit_queue()
then it takes a long time for drivers which create lots of queues
and tear them down during boot.

If we use call_rcu() to free cfqd(), then there is no gurantee that
request queue object is still around. (cfqd->queue).

Even if we take request queue reference to keep request queue object,
then there is no gurantee that queue->lock is around as driver might
have freed it up.

Not sure how to fix this cleanly. Just thought of raising the concern
though while we are at it.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/