Re: [GIT PULL] omap changes for v2.6.39 merge window

From: Nicolas Pitre
Date: Wed Mar 30 2011 - 19:32:06 EST


On Wed, 30 Mar 2011, Linus Torvalds wrote:

> On Wed, Mar 30, 2011 at 1:41 PM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote:
> >
> > If in your mind "competitors" == "morons" then you might be right.
>
> There's a difference between "competition" and "do things differently
> just to be difficult".

Absolutely. We've seen that from some proprietary software companies.

> > Trying to rely on bootloaders doing things right is like saying that x86
> > should always rely on the BIOS doing things right.
>
> No. Not at all.
>
> The problem with firmware/BIOS is that it's set in stone and closed-source.
>
> I'm suggesting splitting out the crazy part into a separate project
> that does this. Open-source. Like a mini-kernel. Because the thing is,
> the main kernel doesn't care, and _shouldn't_ care. Those board files
> are just noise.

Sure, but important noise nevertheless. As long as the noise is
confined to a limited set of .c files I'm happy. OTOH I have very
little hope for a separate project that would only deal with that noise.
That will simply never fly, even less so as an Open Source project.
The insentive for people to work on such thing simply aren't there as
that is totally uninteresting and without any rewards.

Furthermore, this does create pain. you have to make things in sync
between the kernel and the mini-kernel (let's call it bootloader). In
practice the bootloader is always maintained separately from the kernel,
on its own pace and with its own release schedule. Trying to
synchronize independent projects is really painful as you know already,
otherwise the user space for perf would still be maintained separately
from the kernel, right?

Now, when there is a bug in one of the clock settings, or one clock
is missing for that new kernel driver to work properly, the
bootloader would have to be fixed, revalidated, and the fix deployed
separately but still in addition to the kernel. This process still adds
to the pain such that what people do in those cases is simply to hack
the driver code in the kernel. Instead, the OMAP folks created a table
to abstract them into something more manageable.

And here's the final catch. Most of those clocks are often derived from
each other in a tree structure inside the SOC. And for power saving
reasons, some crazy people want to dynamically change the config for
those clocks at run time according to the required frequency for given
loads, turn them off when possible, and of course turn the parent clock
off as well if all the children clocks are themselves turned off. So
the kernel has NO CHOICE but to be fully aware of them.

Then comes power domains with the cascade of regulators and so forth,
again all software controlled. Add to the mix the different sleep
states that can be derived from that, which is far more sophisticated
than ACPI states on Intel. And in some cases, the hardware capabilities
are there but people still didn't find the optimal way to drive them, so
research is still on-going software wise. And obviously those SOC
vendors do compete on that front since power consumption is the killing
weapon these days. No wonder why they are so different from each other
with all that "board crap".

> The long-term situation should be that you should be able to have ONE
> binary kernel "just work". That's where we are on x86. Really.

But X86 is peanuts. Really. There was one machine called the IBM PC at
some point that everybody cloned, and the rest was totally irrelevant.
Then came that thing called Windows that reinforced this hardware
monoculture as it was used for the ultimate conformance testing. This
is damn easy in that case to produce a kernel that works virtually
everywhere.

On ARM there is simply not such thing as a single machine design to
clone, and a closed source test bench to design for.

And this is orthogonal to this discussion anyway, as having in-kernel
clock tables is not incompatible with a single kernel binary. Dropping
at runtime those clock tables that are irrelevant to the currently
running hardware is not rocket science.

> Without that kind of long-term view, where do you think ARM is going
> to be in five years?

ARM is going to still be relevant simply because they now have Linux
that they can modify to suit their latest changes. That's one thing
with Open Source which can be good or bad: full hardware compatibility
is no longer an issue since the software can be adapted at will.

Still... there are on-going efforts to consolidate things amongst all
the ARM vendors. The ARM architecture is standardizing more and more
stuff in the whole stack in every revision. But they won't standardize
everything otherwise they'll kill that competing ecosystem.

> >> almost *SIXTY* percent of all arch updates were to ARM code.
> >
> > Absolutely not!  You have 14% going to OMAP code which happens to be
> > under arch/arm/ but there is nothing ARM specific in there.  If OMAP was
> > using a PPC or a MIPS core then you'd have the same result except under
> > arch/powerpc or arch/mips.  There is very little in terms of ARM
> > specific peculiarities under arch/arm/mach-omap2/ in fact.
>
> But that's my point - the problem is all the crazy board crap.
>
> I've never claimed that this is about the ARM cpu (which has it's own
> issues, but that's a separate rant entirely). It's about the broken
> infrastructure.

Let's see how we can fix it then. Trying to shovel the problem away
won't help the situation. Those ARM vendors are crazy for sure. But
it's not a relatively few merge conflicts compared to the volume of
changes that will make us flinch, right?

> Now, some of it is quite understandable - ie real drivers for real
> hardware. But a _lot_ of it seems to be just descriptor tables, and
> I'm getting the very strong feeling that ARM people aren't even
> _trying_ to make it sane, and trying to standardize things, or trying
> to aim for the whole notion of "one kernel image, with much more hw
> description done elsewhere".

That work is happening. It is not ready. I'm not against it but I
remain sceptical. I still think that a self contained kernel is more
maintainable.

Still, because ARM is just a CPU architecture, those SOC vendors will
always have something new to differenciate themselves from the other SOC
vendors. And that cannot be described in a table alone. The power
management hardware from TI will still require separate _executable_
code from the Freescale one, or the Samsung one, or the Nvidia one, or
the Qualcomm one, or the Marvell one, yada yada. And I really don't
want to see that code turned into some vendor provided buggy ACPI
bytecode or similar.

> arch/arm is already about 3x the size of arch/x86. And it's pretty
> much all the crazy infrastructure afaik. timer chips, irq chips, gpio
> differences - crap like that.

Indeed. And I expect it to grow even bigger. Be warned.

> And the fact that you don't even seem to UNDERSTAND the problem, and
> think that it's ok, and that continued future explosion of this is all
> fine makes me even more nervous.

I do understand the problem. And so far, the way we scaled is to have
TI people care about the OMAP code, Freescale people care about the iMX
code, and so on. If one of them produces crap code then so it is, and
the other vendor is totally unaffected, which is why I'm not too
nervous. Blaming a merge conflict on the entire ARM ecosystem just
because one team was large enough to have separate people doing
different things that intersected into the clock table is blowing things
totally out of proportion.

And if those hardware vendors are still in business in the future, and
apparently new ones are joining in, then the arch/arm/ directory will
continue to gain weight. And on ARM, Linux is very very successful
that's all.


Nicolas