Re: [PATCH] ARM: allow, but warn, when issuing ioremap() on RAM

From: Felipe Contreras
Date: Sat Oct 09 2010 - 06:00:56 EST


On Sat, Oct 9, 2010 at 6:36 AM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote:
> On Fri, 8 Oct 2010, Greg KH wrote:
>> Wait, let me get this straight:
>> Â - drivers used to work on 2.6.35
>
> Possibly on pre ARMv6 only. ÂOn ARMv6 and above, the discussed behavior
> _never_ worked. ÂIt is fundamental to the CPU design.

That's not true, drivers on ARMv6 and above do work. I still wonder
how exactly to trigger the issue, and how often does this happen,
because I've never seen it.

I trust the experts that it is an issue, but I don't agree that
drivers that are currently working on ARMv6+ should be broken on
purpose right away.

>> Â - some ARM core code changed in .36-rc to fix this iomem problem that
>> Â Â you found
>
> Nothing was fixed. ÂThere is nothing to fix. ÂSimply that a wrong driver
> assumption about the hardware capability was prohibited by the API.

Not exactly. There is supposedly a problem on ARMv6+, there must be a
mechanism to avoid this problem, all drivers should migrate to that,
or a different on, so that we have a sane API.

>> Â - no drivers are notified of the api change as it's a run-time change,
>> Â Â so the build doesn't break.
>
> There is no API change. ÂSimply an API misuse. ÂGranted, on pre ARMv6
> this usage used to work, but on ARMv6 and above this never was supported
> at all by the hardware. ÂThe fact that the software has let it through
> was a gross mistake, period.

No. It worked just fine, now it's doesn't, that's a change in API
behavior, which is also part of the API. It might not be ideal, or
proper, but it does work.

>> Â - drivers break when run as the api stops returning valid addresses
>
> Such drivers on ARMv6 should *never* have worked in the first place.
> When they did "work", they corrupted memory.

They corrupted their own memory, right? And on _very_ rare occasions I
presume, since I've never seen it. If so, there's no damage in letting
them work at least one more release (2.6.36).

>> Â - no known way is around to fix the broken drivers
>
> That's B/s. ÂAs Russell said, there are known solutions that were
> proposed at least six months ago.

You seriously think most people out there know how to "set aside some
RAM at boot time"? I don't, otherwise the drivers would be fixed by
now.

>> Um, this doesn't sound like a valid thing to be doing,
>
> Excuse me, but preventing data corruption is a damn valid thing to do.
> The real problem is that people kept on abusing the API for something
> that the hardware simply can't support, and the mere fact that things
> _appears_ to just work by accident in 98% of the cases encourage people
> to simply put their head in the sand and ignore the issue. ÂSorry folks,
> but that's not how computing works. ÂWe're not amateurs for damn sake,
> and when the ARM bible says this operation is UNSUPPORTED, it is
> UNSUPPORTED. ÂSeeing people trying to fsck around the issue because of
> laziness is revolting!

Have you ever seen this corruption? How often does this happen? So, a
driver would generate corruption in it's own data and your solution is
to make the driver stop working? That's going from 98% to 0% in one
release.

There's many people out there that have realized these issues, and the
only thing they would see is their driver broken on .36. How about
instead we just warn for now? What's wrong with that? There have been
many releases with the "wrong" ioremap(), one more would not hurt
anybody.

>> how do you expect
>> people to fix their code if they:
>> Â Â Â - don't realize it as the api change doesn't break the build
>
> The API should have complained the very first time they attempted the
> operation in question. ÂIt should never have worked. ÂI'm sorry but the
> mere fact that the broken driver didn't receive an error before is no
> excuse to *knowingly* let memory corruption go on.

Yes, it should have, but it didn't. Now people are relying on that
behavior, so, first you deprecate that usage with WARN_ON, and later
you remove that functionality.

>> Â Â Â - there is no way to fix their code
>
> Wrong. ÂSee above.
>
>> This sounds like a huge regression that should be reverted, or am I
>> missing something here?
>
> I'm afraid you are totally missing the point.
>
> The patch Russell introduced simply prevented an abuse of the API for
> which there are documented evidences that such abuse is a source of
> memory corruption. ÂThis must not be allowed any longer, not even for a
> warning period. ÂThere is no regression what so ever as those driver
> were always broken. ÂAnd I much prefer to have brokenness in the form of
> driver initialization error rather than silent and random memory
> corruptions.

People have different definitions of "broken"; working 99% of the time
is good enough for many people, specially compared to 0%.

So you found an issue on ARMv6+, great, let's warn the users so that
they know the issue exists, hopefully a generic mechanism to fix this
will come about for .37 so that some drivers can migrate to that, and
then truly disable the functionality.

--
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/