RE: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

From: Yu, Fenghua
Date: Tue Jan 16 2018 - 13:34:22 EST


> From: Thomas Gleixner [mailto:tglx@xxxxxxxxxxxxx]
> On Tue, 16 Jan 2018, Joseph Salisbury wrote:
> > On 01/16/2018 08:32 AM, Shankar, Ravi V wrote:
> > > Vikas on vacation until end of the month. Fenghua will look into
> > > this issue.
> > >
> > > On Jan 16, 2018, at 5:09 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx
> > > <mailto:tglx@xxxxxxxxxxxxx>> wrote:
> > >
> > >>
> > >> Vikas, Fenghua can you please look at that ASAP?
> > >>
> > >> On Sun, 14 Jan 2018, Thomas Gleixner wrote:
> > >>
> > >>> On Fri, 12 Jan 2018, Joseph Salisbury wrote:
> > >>>
> > >>>> Hi Vikas,
> > >>>>
> > >>>> A kernel bug report was opened against Ubuntu [0].  After a
> > >>>> kernel bisect, it was found that reverting the following commit
> > >>>> resolved this bug:
> > >>>>
> > >>>> commit 24247aeeabe99eab13b798ccccc2dec066dd6f07
> > >>>> Author: Vikas Shivappa <vikas.shivappa@xxxxxxxxxxxxxxx
> > >>>> <mailto:vikas.shivappa@xxxxxxxxxxxxxxx>>
> > >>>> Date:   Tue Aug 15 18:00:43 2017 -0700
> > >>>>
> > >>>>     x86/intel_rdt/cqm: Improve limbo list processing
> > >>>>
> > >>>>
> > >>>> The regression was introduced as of v4.14-r1 and still exists
> > >>>> with current mainline.  The trace with v4.15-rc7 is in comment #44[1].
> > >>>>
> > >>>> I was hoping to get your feedback, since you are the patch
> > >>>> author.  Do you think gathering any additional data will help
> > >>>> diagnose this issue, or would it be best to submit a revert request?
> > >>>
> > >>> That stinks like a use after free. Can you run with KASAN enabled?
> > >>>
> > >>> Thanks,
> > >>>
> > >>>    tglx
> >
> >
> > Here is some data wiht KASAN enabled:
> > https://bugs.launchpad.net/ubuntu/+source/linux-
> hwe/+bug/1733662/comme
> > nts/51
> >
> > Are there any specific logs you would like to see, or specific actions
> > executed?
>
> No, the KASAN output is pretty clear where the issue is.
>
> Thanks,
>
> tglx

Is this a Haswell specific issue?

I run the following test forever without issue on Broadwell and 4.15.0-rc6 with rdt mounted:
for ((;;)) do
for ((i=1;i<88;i++)) do
echo 0 >/sys/devices/system/cpu/cpu$i/online
done
echo "online cpus:"
grep processor /proc/cpuinfo |wc
for ((i=1;i<88;i++)) do
echo 1 >/sys/devices/system/cpu/cpu$i/online
done
echo "online cpus:"
grep processor /proc/cpuinfo|wc
done

I'm finding a Haswell to reproduce the issue.

Thanks.

-Fenghua