Re: [PATCH v2 1/5] x86: replace timeouts when booting secondary CPU with infinite wait loop

From: Ingo Molnar
Date: Thu Apr 03 2014 - 02:44:11 EST



* Igor Mammedov <imammedo@xxxxxxxxxx> wrote:

> > I've seen that. Kernel still boots. With your patch it would hang.

Nonsense, not booting is OK when critical hardware is genuinely bad -
this isn't a disk drive or networking where bad IO 'happens sometimes'
and failure is something we have to engineer for - this is the CPU!

If a critical piece of hardware like the CPU or RAM is non-functional
then it should be excluded by the user explicitly, not worked around
after some ugly, non-deterministic and fragile timeout.

The timeout in the SMP bringup code was really an ancient property,
introduced back more than a decade ago when hardware makers were
ignorant of Linux we were ignorant of how to properly interface with
SMP hardware.

Today a 'timeout' means one of 3 things:

- bad, fragile hardware - this we don't want to hide, unless
explicitly told so by the user. I've seen such symptoms related to
overclocking for example - so not booting is perfectly justified,
it can prevent reporting a bogus kernel crash down the line.

- buggy SMP bringup. That is a bug that needs to be fixed, not
worked around.

- timeout fragility in virtualized environments

I'm not aware of any genuine case where timing out is the correct
thing to do.

So the patches look fine to me as-is, I planned on looking at them
more closely after the merge window.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/