RE: [EXTERNAL] Re: [PATCH] ASoC: max98373: Mark cache dirty before entering sleep

From: Ryan Lee
Date: Mon Sep 27 2021 - 18:43:12 EST


> -----Original Message-----
> From: Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx>
> Sent: Monday, September 27, 2021 12:34 PM
> To: Ryan Lee <RyanS.Lee@xxxxxxxxxxxxxxxxxxx>; Mark Brown
> <broonie@xxxxxxxxxx>
> Cc: lgirdwood@xxxxxxxxx; perex@xxxxxxxx; tiwai@xxxxxxxx; yung-
> chuan.liao@xxxxxxxxxxxxxxx; guennadi.liakhovetski@xxxxxxxxxxxxxxx; alsa-
> devel@xxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> sathya.prakash.m.r@xxxxxxxxx; ryan.lee.maxim@xxxxxxxxx
> Subject: Re: [EXTERNAL] Re: [PATCH] ASoC: max98373: Mark cache dirty
> before entering sleep
>
> EXTERNAL EMAIL
>
>
>
> >> Instead of changing the suspend sequence, can we please try to modify
> >> the
> >> max98373_io_init() routine to unconditionally flag the cache as
> >> dirty, maybe this points to a problem with the management of the
> >> max98373->first_hw_init flag.
> >
> > max98373_io_init() is not called because ' sdw_slave_status' remains '
> > SDW_SLAVE_ATTACHED' and 'max98373->hw_init' is already true.
> > Removing 'if (max98373->hw_init || status != SDW_SLAVE_ATTACHED)'
> > condition in max98373_update_status() function instead of adding
> > regcache_mark_dirty() into max98373_suspend() can be an alternative way.
> > I think it is all about where regcache_mark_dirty() is called from.
> > The difference is that max98373_io_init() really do the software reset
> > and do amp initialization again which could be an overhead.
>
> that description is aligned with my analysis that there's something very
> wrong happening here, it's not just a simple miss in the regmap handling but
> a major conceptual bug or misunderstanding in the way reset is handled.
>
> First, there's the spec: on a reset initiated by the host or if the device loses
> sync for ANY reason, its status cannot remain ATTACHED.
> There's got to be a 16-frame period at least where the device has to monitor
> the sync pattern and cannot drive anything on the bus.
>
> Then there's the hardware behavior on resume: on resume by default the
> Intel host will toggle the data pin for at least 4096 frames, which by spec
> means severe reset.
>
> And last, there's the software init: we also force the status as UNATTACHED
> in drivers/soundwire/intel.c:
>
> /*
> * make sure all Slaves are tagged as UNATTACHED and provide
> * reason for reinitialization
> */
> sdw_clear_slave_status(bus,
> SDW_UNATTACH_REQUEST_MASTER_RESET);
>
> But we've also seen the opposite effect of an amplifier reporting attached
> but losing sync immediately after the end of enumeration and never coming
> back on the bus, see issue
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fthesofproject%2Flinux%2Fissues%2F3063&amp;data=04%7C01%
> 7Cryans.lee%40maximintegrated.com%7Cb9f84a1267ec4f50b7a008d981edc
> c46%7Cfbd909dfea694788a554f24b7854ad03%7C0%7C0%7C637683680607
> 026027%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2
> luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=rARkwTSB3
> DN%2BCYxGaehOhtCGEj1eLBl6Mk7QhynQSY8%3D&amp;reserved=0
>
> In other words, we need to check what really happens on resume and why
> the amplifier keeps reporting its status as ATTACHED despite the spec
> requirements and software init, or loses this status after
> enumeration....Something really does not add-up, again it's not just a
> regmap management issue.
>

I agree that the fix is not necessary if the reset issue was not occurred.
Thanks for your input about the status check on Intel driver, too.
I will continue to find the culprit who cause the amp reset, but this is not simple
because it is not only related to Maxim driver but also other things
on both hardware and software side.
I think making the amp driver code more conservative by adding
regcache_mark_dirty() can make the system robust from the glitches between
suspend and resume. This is what I can do from the amp driver side and
the code I added is not new but a proven way on existing drivers as we all know.
I just wonder why the issue was not observed when the code is removed.
This probably means there is an external reason which was not exist before.

>
>