Re: x86_pmu_start WARN_ON.

From: Peter Zijlstra
Date: Wed Feb 19 2014 - 05:20:10 EST


On Tue, Feb 18, 2014 at 05:20:57PM -0500, Vince Weaver wrote:
> On Tue, 18 Feb 2014, Vince Weaver wrote:
>
> > On Mon, 17 Feb 2014, Peter Zijlstra wrote:
> >
> > > Enable CONFIG_FRAME_POINTER for better stack traces; I suspect the
> > > list_del_event() is just random stack garbage. The path that makes sense
> > > is:
> > > wait_rcu()->__wait_for_common()->schedule_timeout()
> >
> > Here's an updated stack trace on 3.14-rc3 with CONFIG_FRAME_POINTER
> > enabled, in case it's helpful:
>
> Still chasing this, although all I can add are these debug messages:
>
> [ 140.812003] PROBLEM: n_events=2 n_added=2 VMW: idx=33 state=f00 type=0 config=0 samp_per=5e6069eb0
> [ 140.812003] ALL: VMW: Num=0 idx=33 state=f00 type=0 config=0 samp_per=5e6069eb0
> [ 140.812003] ALL: VMW: Num=1 idx=0 state=3 type=0 config=1 samp_per=0
>
> So when the WARN gets triggered there only only two events in the event
> list, the NMI watchdog which has already been enabled somehow (that f00
> I stuck in, pmu_start sets it to f00 instead of 00 to make sure it wasn't
> something stomping on memory) and the precise instructions event.
>
> I still have a hard time following what all the schedule in code is doing.

Yes, I got it once, then promptly forgot it. It all became the thing it
is because AMD Fam15 had some horrible constraints.

So in general it tries to map events to counters in order of decreasing
constraints (so it starts with the most constrained events).

It all gets a bit funny due to overlapping constraints; see commit
bc1738f6ee830 for a little blurb on what the overlap thing is about.


So when we add a new event (or more) we compute a mapping from event to
counter. Then we disable all (pre existing) events that moved to a new
location, then we enable all events (insert HES_ARCH) that were running
but got relocated and the new events.

Of course the code is horrible, but I think the above is what it does.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/