Re: linux-next: Tree for Apr 9 (x86 boot problem)

From: Mike Rapoport
Date: Tue Apr 13 2021 - 14:23:15 EST


On Tue, Apr 13, 2021 at 10:34:25AM -0700, Randy Dunlap wrote:
> On 4/13/21 9:58 AM, Mike Rapoport wrote:
> > On Mon, Apr 12, 2021 at 11:21:48PM -0700, Randy Dunlap wrote:
> >> On 4/12/21 11:06 PM, Mike Rapoport wrote:
> >>> Hi Randy,
> >>>
> >>> On Mon, Apr 12, 2021 at 01:53:34PM -0700, Randy Dunlap wrote:
> >>>> On 4/12/21 10:01 AM, Mike Rapoport wrote:
> >>>>> On Mon, Apr 12, 2021 at 08:49:49AM -0700, Randy Dunlap wrote:
> >>>>>
> >>>>> I thought about adding some prints to see what's causing the hang, the
> >>>>> reservations or their absence. Can you replace the debug patch with this
> >>>>> one:
> >>>>>
> >>>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >>>>> index 776fc9b3fafe..a10ac252dbcc 100644
> >>>>> --- a/arch/x86/kernel/setup.c
> >>>>> +++ b/arch/x86/kernel/setup.c
> >>>>> @@ -600,10 +600,13 @@ static bool __init snb_gfx_workaround_needed(void)
> >>>>> return false;
> >>>>>
> >>>>> vendor = read_pci_config_16(0, 2, 0, PCI_VENDOR_ID);
> >>>>> + devid = read_pci_config_16(0, 2, 0, PCI_DEVICE_ID);
> >>>>> +
> >>>>> + pr_info("%s: vendor: %x, device: %x\n", __func__, vendor, device);
> >>>>
> >>>> s/device)/devid)/
> >>>
> >>> Oh, sorry.
> >>>
> >>>>> +
> >>>>> if (vendor != 0x8086)
> >>>>> return false;
> >>>>>
> >>>>> - devid = read_pci_config_16(0, 2, 0, PCI_DEVICE_ID);
> >>>>> for (i = 0; i < ARRAY_SIZE(snb_ids); i++)
> >>>>> if (devid == snb_ids[i])
> >>>>> return true;
> >>>>
> >>>> That prints:
> >>>>
> >>>> [ 0.000000] snb_gfx_workaround_needed: vendor: 8086, device: 126
> >>>> [ 0.000000] early_reserve_memory: snb_gfx: 1
> >>>> ...
> >>>> [ 0.014061] snb_gfx_workaround_needed: vendor: 8086, device: 126
> >>>> [ 0.014064] reserving inaccessible SNB gfx pages
> >>>>
> >>>>
> >>>> The full boot log is attached.
> >>>
> >>> Can you please send the log with memblock=debug added to the kernel command
> >>> line?
> >>>
> >>> Probably should have started from this...
> >>>
> >>
> >> It's attached.
> >
> > Honestly, I can't see any reason why moving these reservations around would
> > cause your laptop to hang.
> > Let's try moving the reservations back to their original place one by
> > one, e.g something like this:
> >
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 776fc9b3fafe..892ad20b8557 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -632,12 +632,6 @@ static void __init trim_snb_memory(void)
> >
> > printk(KERN_DEBUG "reserving inaccessible SNB gfx pages\n");
> >
> > - /*
> > - * Reserve all memory below the 1 MB mark that has not
> > - * already been reserved.
> > - */
> > - memblock_reserve(0, 1<<20);
> > -
> > for (i = 0; i < ARRAY_SIZE(bad_pages); i++) {
> > if (memblock_reserve(bad_pages[i], PAGE_SIZE))
> > printk(KERN_WARNING "failed to reserve 0x%08lx\n",
> > @@ -1081,6 +1075,12 @@ void __init setup_arch(char **cmdline_p)
> >
> > reserve_real_mode();
> >
> > + /*
> > + * Reserve all memory below the 1 MB mark that has not
> > + * already been reserved.
> > + */
> > + memblock_reserve(0, 1<<20);
> > +
> > init_mem_mapping();
> >
> > idt_setup_early_pf();
> >
>
> Mike,
> That works.
>
> Please send the next test.

I think I've found the reason. trim_snb_memory() reserved the entire first
megabyte very early leaving no room for real mode trampoline allocation.
Since this reservation is needed only to make sure integrated gfx does not
access some memory, it can be safely done after memblock allocations are
possible.

I don't know if it can be fixed on the graphics device driver side, but
from the setup_arch() perspective I think this would be the proper fix: