Re: thinkpad x60: sound problems in 4.15-rc1 was Re: thinkpad x60: sound problems in 4.14.0-next-20171114

From: vcaputo
Date: Tue Dec 19 2017 - 19:52:12 EST


On Wed, Dec 20, 2017 at 01:33:45AM +0100, Thomas Gleixner wrote:
> On Tue, 19 Dec 2017, vcaputo@xxxxxxxxxxx wrote:
> > On Wed, Dec 20, 2017 at 12:22:12AM +0100, Pavel Machek wrote:
> > > You forgot to mention commit id :-).
> > >
> >
> > That is very strange, anyhow:
> >
> > commit fdba46ffb4c203b6e6794163493fd310f98bb4be
> > Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Date: Wed Sep 13 23:29:27 2017 +0200
> >
> > x86/apic: Get rid of multi CPU affinity
> >
> >
> > Will try reverting soon, just a bit busy today out in the desert and the sun
> > is going down so my solar panel is useless.
>
> The revert is not trivial.
>
> What is the exact problem and how do you reproduce that?
>

Dang.

Ostensibly the problem is audio playback looping what seems to be stuck in
a DMA buffer when I pause the audio.

I also saw once during the bisect on a 'bad' commit, a reproduction of
this, which hung everything doing IO until it the ata1 reset happened:

[ 36.606657] Monitor-Mwait will be used to enter C-3 state
[ 36.628663] do_IRQ: 0.35 No irq handler for vector
[ 37.875724] do_IRQ: 0.194 No irq handler for vector
[ 69.099090] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 69.099165] ata1.00: failed command: FLUSH CACHE EXT
[ 69.099211] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 6
res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 69.099285] ata1.00: status: { DRDY }
[ 69.099309] ata1: hard resetting link
[ 69.406185] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 69.409255] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 69.409259] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 69.409261] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[ 69.409433] ata1.00: supports DRM functions and may not be fully accessible
[ 69.409819] ata1.00: NCQ Send/Recv Log not supported
[ 69.411644] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 69.411649] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 69.411654] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[ 69.411842] ata1.00: supports DRM functions and may not be fully accessible
[ 69.412242] ata1.00: NCQ Send/Recv Log not supported
[ 69.412719] ata1.00: configured for UDMA/133
[ 69.412721] ata1.00: retrying FLUSH 0xea Emask 0x4
[ 69.412964] ata1: EH complete

But that is more difficult to reproduce, it doesn't seem to happen
regularly. Infact, I thought that was independent from the audio problem
at first but now it's become clear they're all related.

The 'bad' commits always showed the 'do_IRQ: No irq handler for vector'
lines, sometimes they appear for the first time on the console when I
yanked the power cord while playing music in the quick repro cycle without
running X or anything.

What I've been doing to reproduce this is simply boot to multi-user.target,
login, run `cmus`, and play a song. Next pull the AC power from the
thinkpad x61s.

Immediately the audio gets messed up, pausing the audio doesn't pause it,
it just loops the last tiny buffer contents.

That's all I've got right now, but it doesn't seem to be limited to the
audio problem as shown by the ata1 reset. Also I've searched through all
my stored journals for anything resembling that ata1 problem, and there's
not a single occurrence going back ~300 boots across a handful of kernel
versions.

If you'd like to prep a patch for me to test, I'm happy to test, but I'm
not sure how online I'll be for the next 24 hours. It's some kind of
miracle we're communicating as-is, I'm in the middle of the damn desert but
somehow there's a cell signal.

Thanks,
Vito Caputo