Re: [PATCH 2/3] pci: Clamp pcie_set_readrq() when using"performance" settings

From: Benjamin Herrenschmidt
Date: Wed Oct 05 2011 - 03:03:12 EST

On Tue, 2011-10-04 at 10:36 -0700, Linus Torvalds wrote:
> On Tue, Oct 4, 2011 at 10:30 AM, Benjamin Herrenschmidt
> <benh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Patches 1 and 2 fix it to do what it's supposed to.
> Maybe.
> Or maybe not.


> We just don't the hell know, do we?

Well, we do with some confidence :-) Or rather what we do know is what
you have today in your tree is broken.

Let's break the story back into simple steps, I think there's a lot of
confusion about what's there, what not, etc... :

- I came up with a crackpot configuration scheme for MPS based on what
pHyp does on our machines. Whether that's scheme is suitable for the
general case or not or even useful is the root of the debate between Ben
and I (I originally intended to have this inside arch/powerpc).

- Jon was trying to fix MPS related issues (BIOS doing the wrong thing
on some machines) at the same time in generic code.

- I described my scheme to Jon, who implemented it as the "performance"
option, he also implemented a "safe" option which does the classic
algorithm of clamping everbody down the the lowest common MPS.

- Jon original patch (which went upstream) had a few problems. Among
others it made "performance" the default, and he didn't completely
understand what I wanted, ie, his implementation of "performance" was in
fact totally broken (it would try to configure devices with MPS larger
than what they can support for example). So that was a trainwreck.

- I was travelling and didn't get to review things properly before it
went upstream.

- Jon eventually got to fix things in most cases by switching the
default to "safe" rather than "performance" (as it should have been to
start with). This exposed different problems related to even "safe" not
working well on some chipsets due to HW errata. This is what you have
upstream today.

- Jon patches 3 switches things to "don't touch" by default. We agree
that's what we want for 3.1. However, whatever way you look at it,
"performance" is still totally broken (he didn't implement my algorithm,
but something "else" that cannot work and never worked).

- Patches 1 and 2 only affect the "performance" case and "fix" it to do
what I want.

So patches 1 and 2 are about taking the existing busted & known to not
work "performance" option in your tree and turning it into what I
originally wanted it to be, which will work at least for some cases
(tested !).

Now you may prefer to just remove the code for "performance" completely,
I wouldn't object.

But don't leave it there in a known totally busted state.... at least
that's my preference.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at