Re: [PATCH 3/3] x86: UV2 BAU hang workarounds

From: Ingo Molnar
Date: Mon Jun 25 2012 - 08:40:47 EST



* Cliff Wickman <cpw@xxxxxxx> wrote:

> On Mon, Jun 25, 2012 at 12:03:21PM +0200, Ingo Molnar wrote:
> >
> > * Cliff Wickman <cpw@xxxxxxx> wrote:
> >
> > > On SGI's UV2 the BAU (Broadcast Assist Unit) driver can hang under a
> > > heavy load. To cure this:
> > >
> > > - Disable the UV2 extended status mode (see UV2_EXT_SHFT), as this
> > > mode changes BAU behavior in more ways then just delivering an extra bit
> > > of status. Revert status to just two meaningful bits, like UV1.
> > > - Use no IPI-style resets on UV2. Just give up the request for whatever the
> > > reason it failed and let it be accomplished with the legacy IPI method.
> > > - Use no alternate sending descriptor (the former UV2 workaround
> > > bcp->using_desc and handle_uv2_busy() stuff). Just disable the use of the
> > > BAU for a period of time in favor of the legacy IPI method when the h/w bug
> > > leaves a descriptor busy.
> > > -- new tunable: giveup_limit determines the threshold at which a hub is
> > > so plugged that it should do all requests with the legacy IPI method for a
> > > period of time
> > > -- generalize disable_for_congestion() (renamed disable_for_period()) for
> > > use whenever a hub should avoid using the BAU for a period of time
> > >
> > > Misc:
> > > - fix find_another_by_swack(), which is part of the UV2 bug workaround
> > > - correct and clarify the statistics (new stats s_overipilimit s_giveuplimit
> > > s_enters s_ipifordisabled s_plugged s_congested)
> >
> > Sigh, it looks like something that ought to be 7 successive,
> > easy to review commits got mixed up into a single, huge, hard to
> > review commit. How did that happen?
> >
> > Thanks,
> >
> > Ingo
>
> Hi Ingo,
>
> Yes, admittedly large.
> This patch was the 'bottom line' of a great deal of experimentation on
> how to work around some hardware problems with the bau. This is what
> remains after pulling out the unnecessary or unhelpful attempts.

Ok - this happens sometimes.

> I could break it up for review purposes, if you think anyone would
> want to examine each component.
> You sound like you're willing to spend that time and effort. Yes?

I had a look already and it didn't look fundamentally
objectionable - besides its size. As long as it wasn't actually
the result of merging multiple patches I'll apply it to
tip:x86/uv. If there's problem with the patch we could still
break it up and re-try.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/