Re: [BUG] x86: reboot doesn't reboot

From: Matthew Garrett
Date: Fri Apr 04 2014 - 12:21:33 EST


On Fri, Apr 04, 2014 at 09:09:32AM -0700, Linus Torvalds wrote:
> On Fri, Apr 4, 2014 at 8:45 AM, Matthew Garrett <mjg59@xxxxxxxxxxxxx> wrote:
> >
> > Which is almost certainly because the other reboot methods are trapping
> > into SMI and hitting some hardware that we've left in a different state
> > to Windows.
>
> Why are you making up these completely invalid arguments? Because you
> are making them up.
>
> Our default reboot type is REBOOT_ACPI. That's the one we try *first*.
> There are no "other reboot methods" playing games.

We try ACPI. That will sometimes work, but infrequently. We then try the
keyboard controller. That will generally work. We then try ACPI again,
which will typically work because it's often now the second write to
CF9. We then try the keyboard controller again, because that's what
Windows does. The machine should now have rebooted.

But *any* of those accesses could have generated an SMI. For all we know
the firmware is running huge quantities of code in response to any of
those register accesses. We don't know what other hardware that code
touches. We don't know what expectations it has. We don't know whether
it was written by humans or written by some sort of simulated annealing
mechanism that finally collapsed into a state where Windows rebooted and
then shipped (or even humans behaving indistinguishably from a simulated
annealing mechanism).

> And given this *fact*, your denial that "PCI reboot should never be
> used" is counterfactual. It may be true in some theoretical "this is
> how the world should work" universe, but in the real world it is just
> BS.
>
> Why are you so deep in denial about this?

Because Windows doesn't use CF9 but machines reboot anyway. That
shouldn't be a controversial point of view. We know that CF9 fixes some
machines. We know that it breaks some machines. We don't know how many
machines it fixes or how many machines it breaks. We don't know how many
machines are flipped from a working state to a broken state whenever we
fiddle with the order or introduce new heuristics. We don't know how
many go from broken to working. The only way we can be reasonably
certain that hardware will work is to duplicate precisely what Windows
does, because that's all that most vendors will ever have tested.

The problem is that while we may know exactly what Windows does in terms
of actually triggering the reboot, we don't know everything else it does
on the shutdown path and it's difficult to instrument things like
embedded controller accesses when qemu doesn't emulate one. I'll contact
the people I know in Dell and see if I can find anyone from the firmware
division who'll actually talk to us.

--
Matthew Garrett | mjg59@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/