Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

From: Josh Boyer
Date: Tue Aug 05 2014 - 07:31:40 EST


On Wed, Jul 30, 2014 at 11:47 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Jul 29, 2014 at 11:53 PM, Jakub Jelinek <jakub@xxxxxxxxxx> wrote:
>>
>> IMNSHO this is a too big hammer approach. The bug happened on a single
>> file only (right?)
>
> Very dubious. We happened to see it in a single case, and _maybe_ that
> was the only one in the whole kernel. But it's much more likely that
> it wasn't - it's not like the code in question was even all that
> unusual (just a percpu access triggering an asm - but we have tons of
> asms in the kernel).
>
> I'd argue that we were very lucky to get the problem happening
> reliably enough for a couple of people who then cared enoiugh to do
> good bug reports (considering that it needed an interrupt in *just*
> the right place) that we could debug it at all. In some code that gets
> run much less than the scheduler, it could easily have been one of
> those "people report it once in a blue moon, looks like memory
> corruption".
>
> Now, it would be interesting to hear if there is something very
> special that made that instruction scheduling bug trigger just for
> 4.9.x, or if there is something else that made it very particular to
> that code sequence. But in the absence of good reasoning to the
> contrary, I'd much rather say "let's just avoid the bug entirely".
>
> And that's partly because we really don't care that much about the
> debug info. Yes, it gets used, but it's not *that* common, and the
> last time the issue of debug info sucking up tons of resources came
> up, the biggest users were people who just wanted line information for
> oopses. Yes, there are people running kgdb etc, but on the whole it's
> rare, and quite frankly, from everything I have _ever_ seen, that's
> not how the real kernel bugs are ever really discovered. So the kind
> of debug information that the variable tracking logic adds just isn't
> all that important for the kernel.

Sorry to bring this back up after the fact, but it's important for a
number of things in various distros. I don't disagree it should be
disabled by default, but making it unconditional is going to force the
distributions that care about perf, systemtap, and debuggers to
manually revert this. That deviation is concerning because the
upstream kernel won't easily be buildable the same way distros build
it.

I'm happy to come up with a config option patch, but I'm not sure if
it would be accepted. Is that a possibility at this point?

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/