Re: Regression found (Stop-marking-clocks-as-CLK_IS_CRITICAL)

From: Hans de Goede
Date: Tue Oct 30 2018 - 07:05:10 EST


Hi,

On 30-10-18 11:17, Hans de Goede wrote:
Hi Pierre-Louis,

On 29-10-18 23:03, Pierre-Louis Bossart wrote:

On 10/29/18 2:08 PM, Dean Wallace wrote:
On 29-10-18, Andy Shevchenko wrote:
On Mon, Oct 29, 2018 at 7:52 PM Andy Shevchenko
<andy.shevchenko@xxxxxxxxx> wrote:
Cc: Pierre as well.

On Mon, Oct 29, 2018 at 7:48 PM Stephen Boyd <sboyd@xxxxxxxxxx> wrote:
Quoting Dean Wallace (2018-10-25 16:25:17)
I have found a regression in 4.18.15 that means I lose sound on my old
Toshiba Chromebook 2 (Swanky). My system details are:-

Toshiba Chromebook (Swanky)
MrChromebox UEFI coreboot
Arch Linux running latest alsa/pulseaudio

Upgraded kernel from 4.18.14 to 4.18.15 and lost all sound output. By
output I mean, the card is still detected, the module loaded, all apps
showing sound is being playing, but no actual audible sound comes
through. Upgraded to 4.18.16 same issue.

Dug around and found Upstream commit 648e921888ad96ea3dc922739e96716ad3225d7f
clk: x86: Stop marking clocks as CLK_IS_CRITICAL
"This commit removes the CLK_IS_CRITICAL marking, fixing Cherry Trail
devices not being able to reach S0i3 greatly decreasing their battery
drain when suspended."

I reverted it and compiled 4.18.16 and have sound back again. Could
this be looked into, with possibility of fix.

Thanks for the bug report. I'm adding some people involved in the commit
you mention is causing audio regressions. The best plan is to probably
revert the commit from the 4.18 linux stable tree. Or there may be
another patch missing that would be useful to make this backported patch
work. Hopefully Hans or Andy knows.
Hans has been investigating S0ix issues on Baytrail and Cherrytrail machines.
I have a feeling that the problem can be fixed by properly handling
clock in ASoC driver(s). Perhaps Hans and Pierre can figure this out
better than me.
Looking to sound/soc/intel/boards/cht_bsw_rt5672.c I see no
suspend-resume hooks. Perhaps, adding them like in the commit
ac8bd9e13be2 ("r8169: Disable clk during suspend / resume") would
help.

I missed this change while i was away. It's indeed the expectation that the audio mclk is handled by the firmware,

I believe that the statement "It's indeed the expectation that the
audio mclk is handled by the firmware" is not correct for BYT/CHT
platforms (and the commit causing the regression only affects BYT/CHT
platforms). Various machine drivers under sound/soc/intel/boards have
code to deal with the mclk themselves, like this:

ÂÂÂÂÂÂÂ drv->mclk = devm_clk_get(&pdev->dev, "pmc_plt_clk_3");
...

ÂÂÂÂÂÂÂ if (SND_SOC_DAPM_EVENT_ON(event)) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ret = clk_prepare_enable(ctx->mclk);
ÂÂÂÂÂÂÂ ...
ÂÂÂÂÂÂÂ } else {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ clk_disable_unprepare(ctx->mclk);
ÂÂÂÂÂÂÂ }

The above code is from sound/soc/intel/boards/cht_bsw_max98090_ti.c
which is the machine driver used on the machines for which problems
are now being reported.

And my commit introducing the problem is in essence a revert of:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=d31fd43c0f9a41e2678a1e78c0f22f0384c6edd3

Which is from 2017-07-14 and the ASoC code for BYT/CHT platforms is
much older then that.

Also I asked Carlo why he wrote that patch and it was to fix a problem
with ethernet on some laptops.

not sure I understand why removing the CLK_IS_CRITICAL was necessary or what it has to do with S0ix.

It is necessary, because if it is set the clock never gets disabled and
on x86 platforms using suspend2idle we are responsible for *all* the hardware
power-management as OS. If we do not disable the clocks then we can only
reach S0i1 instead of S0i3 when suspended leading to increased battery
drain during suspend.

Also note that this patch is only causing problems on CHT + max98090 codec
using machines. I've tested it on many other BYT/CHT machines myself and
we've not had any other bug reports related to this.

So this clearly points to a problem with the clock management in the
cht_bsw_max98090_ti.c machine driver.

I've some ideas how to fix this and I will prepare some patches to test.

p.s.

It seems that a sub-thread of this thread where me and the reporter have
been working on debugging this has lost most of the people in the Cc.

So here is a quick status update. So far this issue has been seen on
Swanky and Clapper model chromebooks.

Reverting the patch causing this regression and then doing:

for i in /sys/kernel/debug/clk/pmc_plt_clk_?; do echo -n "$i: "; cat $i/clk_flags; echo; done

has shown that before my patch to not set CLK_IS_CRITICAL the
CLK_IS_CRITICAL was being set on pmc_plt_clk_0. Which shows
that that clock is somehow being used for sounds on these boards.

I think that pmc_plt_clk_0 is being used instead of pmc_plt_clk_3,
but it could also be the case that both are being used.

I'm currently preparing patches to test both scenarios.

Regards,

Hans