Re: [PATCH][v8] PM / hibernate: Verify the consistent of e820 memory map by md5 value

From: Pavel Machek
Date: Wed Aug 31 2016 - 07:03:39 EST


Hi!

> >> There may be problems going forward, but whether or not they actually happen
> >> depends on what the differences are. So while an e820 mismatch indicates that
> >> things may go wrong, it doesn't necessarily mean that they will.
> >
> > Well "memory won't get corrupted right away" seems like good reason to
> > panic the machine ASAP.
> >
> > You can flip some bits in memory, and it may not cause problems. Still
> > if you know some bits in memory were flipped, you'd better panic the
> > machine. Continuing is unsafe.
> >
> > If you could guarantee that machine will panic down the line, and not
> > something worse, you'd be right.
> >
> > But at least the case where there is _less_ memory available after
> > resume, kernel will write into BIOS reserved memory and bad things
> > will happen. Yes, it usually panics, but it is quite clear it could
> > corrupt memory, too.
>
> That depends a good deal on what those ranges were reserved for.
> There very well may not be anything vital in there.

Umm. Yes, you can also flip some bits in memory, and not hit anything
vital.

> >> Also, that panic() may cause hibernation to stop working in a sort of hard and
> >> nasty way where it used to work flawlessly previously and that would be a
> >> regression, so not really acceptable.
> >
> > Well, turning memory corruption bug into panic is an improvement, not
> > a regression.
>
> Since we don't do anything about these problems today and presumably
> people use hibernation on the affected systems, there are reasons to
> think that the problem is not quite as grave as you're painting it.
>
> But that aside, adding a panic() like in this patch isn't particularly
> useful anyway, because it panics the restore kernel. It is sufficient
> to make arch_hibernation_header_restore() return an error to actually
> fail the resume and cause the restore kernel to discard the image.
> And that would preserve the information about the failure in the
> kernel log at least.

I don't think people are using hibernation today on affected systems
they are getting random oopses/panics, that's how this thread started.

Anyway, I agree that failing the resume is preferable to panic().

Thanks and best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html