Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

From: vcaputo
Date: Sat Dec 23 2017 - 00:33:25 EST


On Wed, Dec 20, 2017 at 01:33:45AM +0100, Thomas Gleixner wrote:
> On Tue, 19 Dec 2017, vcaputo@xxxxxxxxxxx wrote:
> > On Wed, Dec 20, 2017 at 12:22:12AM +0100, Pavel Machek wrote:
> > > You forgot to mention commit id :-).
> > >
> >
> > That is very strange, anyhow:
> >
> > commit fdba46ffb4c203b6e6794163493fd310f98bb4be
> > Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Date: Wed Sep 13 23:29:27 2017 +0200
> >
> > x86/apic: Get rid of multi CPU affinity
> >
> >
> > Will try reverting soon, just a bit busy today out in the desert and the sun
> > is going down so my solar panel is useless.
>
> The revert is not trivial.
>
> What is the exact problem and how do you reproduce that?
>
> Thanks,
>

So I had some time today to poke at this some more. Since it looks to
be easily reproduced by simply pulling the AC power while playing music
or doing IO, and dmesg clearly reports using mwait, I tried booting with
idle=nomwait to see if that made any difference. It didn't, the same
thing still occurs.

In trying to make sense of this totally unfamiliar apic code and better
understand these changes, I came across this comment which seemed a bit
telling:

40 void flat_vector_allocation_domain(int cpu, struct cpumask *retmask,
41 const struct cpumask *mask)
42 {
43 /*
44 * Careful. Some cpus do not strictly honor the set of cpus
45 * specified in the interrupt destination when using lowest
46 * priority interrupt delivery mode.
47 *
48 * In particular there was a hyperthreading cpu observed to
49 * deliver interrupts to the wrong hyperthread when only one
50 * hyperthread was specified in the interrupt desitination.
51 */
52 cpumask_clear(retmask);
53 cpumask_bits(retmask)[0] = APIC_ALL_CPUS;
54 }

It's this allocation domain mask hook which has been bypassed by the
offending commit. The existing approach is more robust in the face of
relaxed adherence to destination cpumasks since it's all-inclusive,
whereas the new code is exclusive to a specific cpu.

Is it possible what I'm observing is just another manifestation of
what's being described in that comment? This is a core 2 duo, so not
hyper-threaded. But maybe something funny happens when switching
cstates in response to interrupts - like maybe the wrong cpu can be used
if it can save power vs. powering up another? Just thinking out loud
here.

In any case, 4.15-rc4 is quite unusable on my machine because of this.

Pavel, do you observe the same behavior on your x60, WRT AC power?

I've dropped Takashi from the CC list as this pretty clearly isn't a
sound-specific problem.

Thanks,
Vito Caputo