Re: [PATCH tip/core/rcu 1/2] rcu: Parallelize and economize NOCB kthread wakeups
From: Paul E. McKenney
Date: Wed Aug 13 2014 - 09:01:04 EST
On Wed, Aug 13, 2014 at 11:14:39AM +0530, Amit Shah wrote:
> On (Tue) 12 Aug 2014 [14:41:51], Paul E. McKenney wrote:
> > On Tue, Aug 12, 2014 at 02:39:36PM -0700, Paul E. McKenney wrote:
> > > On Tue, Aug 12, 2014 at 09:06:21AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Aug 12, 2014 at 11:03:21AM +0530, Amit Shah wrote:
> > >
> > > [ . . . ]
> > >
> > > > > I know of only virtio-console doing this (via userspace only,
> > > > > though).
> > > >
> > > > As in userspace within the guest? That would not work. The userspace
> > > > that the qemu is running in might. There is a way to extract ftrace info
> > > > from crash dumps, so one approach would be "sendkey alt-sysrq-c", then
> > > > pull the buffer from the resulting dump. For all I know, there might also
> > > > be some script that uses the qemu "x" command to get at the ftrace buffer.
> > > >
> > > > Again, I cannot reproduce this, and I have been through the code several
> > > > times over the past few days, and am not seeing it. I could start
> > > > sending you random diagnostic patches, but it would be much better if
> > > > we could get the trace data from the failure.
>
> I think the only recourse I now have is to dump the guest state from
> qemu, and attempt to find the ftrace buffers by poking pages and
> finding some ftrace-like struct... and then dumping the buffers.
The data exists in the qemu guest state, so it would be good to have
it one way or another. My current (perhaps self-serving) guess is that
you have come up with a way to trick qemu into dropping IPIs.
> > > Hearing no objections, random patch #1. The compiler could in theory
> > > cause trouble without this patch, so there is some possibility that
> > > it is a fix.
> >
> > #2... This would have been a problem without the earlier patch, but
> > who knows? (#1 moved from theoretically possible but not on x86 to
> > maybe on x86 given a sufficiently malevolent compiler with the
> > patch that you located with bisection.)
>
> I tried all 3 patches individually, and all 3 together, no success.
I am not at all surprised. You would have to have an extremely malevolent
compiler for two of them to have any effect, and you would have to have
someone invoking call_rcu() with irqs disabled from idle for the other
to have any effect. Which is why I missed seeing them the first three
times I reviewed this code over the past few days.
> My gcc is gcc-4.8.3-1.fc20.x86_64. I'm using a fairly uptodate Fedora
> 20 system on my laptop for these tests.
>
> Curiously, patches 1 and 3 applied fine, but this one had a conflict.
>
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 1dc72f523c4a..1da605740e8d 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2137,6 +2137,17 @@ static bool __call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *rhp,
>
> I have this hunk at line 2161, and...
>
> > trace_rcu_callback(rdp->rsp->name, rhp,
> > -atomic_long_read(&rdp->nocb_q_count_lazy),
> > -atomic_long_read(&rdp->nocb_q_count));
> > +
> > + /*
> > + * If called from an extended quiescent state with interrupts
> > + * disabled, invoke the RCU core in order to allow the idle-entry
> > + * deferred-wakeup check to function.
> > + */
> > + if (irqs_disabled_flags(flags) &&
> > + !rcu_is_watching() &&
> > + cpu_online(smp_processor_id()))
> > + invoke_rcu_core();
> > +
> > return true;
>
> I have return 1; here.
>
> I'm on linux.git, c8d6637d0497d62093dbba0694c7b3a80b79bfe1.
I am working on top of my -rcu tree, which contains the fix from "1" to
"true" compared to current mainline. So this will resolve itself, and
you should be OK fixing up conflict in either direction.
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/