Re: 3.12: kernel panic when resuming from suspend to RAM (x86_64)

From: Rafael J. Wysocki
Date: Fri Nov 22 2013 - 16:55:44 EST


On Friday, November 22, 2013 10:36:23 PM Francis Moreau wrote:
> On 11/22/2013 01:54 PM, Rafael J. Wysocki wrote:
> > On Friday, November 22, 2013 10:57:25 AM Francis Moreau wrote:
> >> Le 22/11/2013 08:43, Francis Moreau a Ãcrit :
> >>> Le 21/11/2013 12:17, Jingoo Han a Ãcrit :
> >>> [...]
> >>>>>
> >>>>>> Also I took a look at the changes between v3.11 and v3.12 in this area
> >>>>>> and those changes match the issue I'm facing:
> >>>>>>
> >>>>>> $ git log --oneline v3.11..v3.12 -- drivers/mfd/rtsx_pcr.c
> >>>>>> 09fd867 mfd: rtsx: Copyright modifications
> >>>>>> eb891c6 mfd: rtsx: Configure to enter a deeper power-saving mode in S3
> >>>>>> 7140812 mfd: rtsx: Move some actions from rtsx_pci_init_hw to individual
> >>>>>> extra_init_hw
> >>>>>> 5947c16 mfd: rtsx: Add shutdown callback in rtsx_pci_driver
> >>>>>> 773ccdf mfd: rtsx: Read vendor setting from config space
> >>>>
> >>>> In my opinion, rtsx_pci_resume()/rtsx_pci_suspend() in realtek PCIe card
> >>>> reader driver may make the kernel panic.
> >>>>
> >>>> I think that the commit "mfd: rtsx: Configure to enter a deeper
> >>>> power-saving mode in S3" may be the culprit.
> >>>
> >>> Unfortunately no, reverting this commit on top of v3.12 doesn't help. I
> >>> also reverted 7140812, 5947c16 but it didn't improve anything.
> >>>
> >>> The good news is that I managed to have a "light" kernel configuration
> >>> which is faster to build and more important it seems that the bug is
> >>> almost 100% reproductible now.
> >>>
> >>> So I'll try to do another git-bisect session later.
> >>
> >> So after bisecting between v3.11..v3.12 range, git bisect told me:
> >>
> >> the first bad commit is 551f5c74e17ba9257cdc35bf657ee448cad2d5b0
> >>
> >> Merge branch 'acpi-processor'
> >>
> >> * acpi-processor:
> >> ACPI / processor: Acquire writer lock to update CPU maps
> >> ACPI / processor: Remove acpi_processor_get_limit_info()
> >>
> >> The two commits brought by the merge are not the culprits because
> >> reseting HEAD on "ACPI / processor: Acquire writer lock to update CPU
> >> maps" doesn't have the issue anymore.
> >>
> >> At that point I'm not sure how to bisect futher.
> >
> > Does the second parent of this merge (that is, 8462d9df9d50) have the problem?
> >
>
> Yes it does.
>
> Ok, I've finally managed to find out the bad commit:
> ad07277e82dedabacc52c82746633680a3187d25: ACPI / PM: Hold acpi_scan_lock
> over system PM transitions
>
> I verified that the parent commit doesn't have the problem.

Interesting.

> Rafael, you're the man now ;)

I kind of don't see how that commit may result in behavior that you
described earlier in the thread.

You get a memory corruption that seems to have started to happen because
we're holding an additional lock over suspend resume now. Something's fishy
on that machine and we need to figure out what it is.

Please file a bug at bugzilla.kernel.org against ACPI and assign it to me.
Please put all of the relevant info in there and attach the output of dmesg
after a fresh boot and the output of acpidump from the affected machine to
the bug entry.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/