Re: [Intel-gfx] alderlake crashes (random memory corruption?) with 6.0 i915 / ucode related
From: Hans de Goede
Date: Mon Oct 17 2022 - 10:32:42 EST
Hi,
On 10/17/22 15:35, Jani Nikula wrote:
> On Mon, 17 Oct 2022, Hans de Goede <hdegoede@xxxxxxxxxx> wrote:
>> Hi,
>>
>> On 10/17/22 13:19, Thorsten Leemhuis wrote:
>>> CCing the regression mailing list, as it should be in the loop for all
>>> regressions, as explained here:
>>> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
>>
>> Yes sorry about that I meant to Cc the regressions list, not you personally,
>> but the auto-completion picked the wrong address-book entry
>> (and I did not notice this).
>>
>>> On 17.10.22 12:48, Hans de Goede wrote:
>>>> On 10/17/22 10:39, Jani Nikula wrote:
>>>>> On Mon, 17 Oct 2022, Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx> wrote:
>>>>>> On Thu, 13 Oct 2022, Hans de Goede <hdegoede@xxxxxxxxxx> wrote:
>>>>>>> With 6.0 the following WARN triggers:
>>>>>>> drivers/gpu/drm/i915/display/intel_bios.c:477:
>>>>>>>
>>>>>>> drm_WARN(&i915->drm, min_size == 0,
>>>>>>> "Block %d min_size is zero\n", section_id);
>>>>>>
>>>>>> What's the value of section_id that gets printed?
>>>>>
>>>>> I'm guessing this is [1] fixed by commit d3a7051841f0 ("drm/i915/bios:
>>>>> Use hardcoded fp_timing size for generating LFP data pointers") in
>>>>> v6.1-rc1.
>>>>>
>>>>> I don't think this is the root cause for your issues, but I wonder if
>>>>> you could try v6.1-rc1 or drm-tip and see if we've fixed the other stuff
>>>>> already too?
>>>>
>>>> 6.1-rc1 indeed does not trigger the drm_WARN and for now (couple of
>>>> reboots, running for 5 minutes now) it seems stable. 6.0.0 usually
>>>> crashed during boot (but not always).
>>>>
>>>> Do you think it would be worthwhile to try 6.0.0 with d3a7051841f0 ?
>>
>> So I have been trying 6.0.0 with d3a7051841f0 doing a whole bunch of
>> reboots + general use and that seems stable, then I reverted it and
>> the very first boot of the kernel with that broke again, so I'm
>> pretty sure that d3a7051841f0 fixes things.
>>
>> So d3a7051841f0 seems to do more then just fix the WARN().
>
> Wow, so I guess we do screw up the parsing royally then. :o
I'm running the kernel with lockdep + list-debugging enabled and
I could not reproduce this (not easily at least) on a standard
Fedora 6.0.0 build without that. So maybe the parsing just manages
to write out of binds a tiny bit which just happens to hit a list_head
somewhere ... ?
Either way things look stable with d3a7051841f0 and it turns out
that Fedora already had that cherry-picked downstream in the
5.19.13 kernel which was stable for me too.
>> So lets try to get d3a7051841f0 added to the official stable series
>> ASAP (I just noticed that Mark Pearson from Lenovo has already added it
>> to Fedora's 6.0.2 build.
>
> I think I'd also pick d3a7051841f0^ i.e. both commits:
>
> d3a7051841f0 ("drm/i915/bios: Use hardcoded fp_timing size for generating LFP data pointers")
> 4e78d6023c15 ("drm/i915/bios: Validate fp_timing terminator presence")
>
> for stable.
That sounds good, can you take care of submitting these to gkh ?
Regards,
Hans