Re: Linux logs error: `mei_me 0000:00:16.0: cl:host=04 me=00 is not connected`

From: Paul Menzel
Date: Mon Apr 01 2024 - 09:07:33 EST


Dear Alexander,


Thank you very much for your reply.

Am 01.04.24 um 08:54 schrieb Usyskin, Alexander:
-----Original Message-----

Sent: Saturday, March 30, 2024 13:56
Am 30.03.24 um 11:50 schrieb Winkler, Tomas:

-----Original Message-----
From: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
Sent: Friday, March 29, 2024 12:49 PM

[…]

On a Dell XPS 13 9360/0596KF, BIOS 2.21.0 06/02/2022 with Debian
sid/unstable and self-built Linux 6.9-rc1+ with one patch on top [1] and
KASAN enabled.

$ git log --no-decorate --oneline -2 a2ce022afcbb
a2ce022afcbb [PATCH] kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
8d025e2092e2 Merge tag 'erofs-for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

After several ACPI S3 (deep) suspend and resume cycles, this morning I
noticed the error below:

[29357.177635] mei_me 0000:00:16.0: cl:host=04 me=00 is not connected

This seems to be logged from `mei_write()` in `drivers/misc/mei/main.c`.

if (!mei_cl_is_connected(cl)) {
cl_err(dev, cl, "is not connected");
rets = -ENODEV;
goto out;
}

with `drivers/misc/mei/client.h` containing:

/**
* mei_cl_is_connected - host client is connected
*
* @cl: host client
*
* Return: true if the host client is connected
*/
static inline bool mei_cl_is_connected(const struct mei_cl *cl)
{
return cl->state == MEI_FILE_CONNECTED;
}

Unfortunately, I do not know at all, why the ME needs to be written to, and
what was tried to be written, and what the effect of this failure is.

Could you please take a look at it?

Looks like a timing issue between setting up HDCP by graphics and
device power management. I don't think this is a really an issue if
this is happening during power cycles stress.

Understood. Could this be because of the Address Sanitizer (KASAN)?

Anyway we will look at that, will you be able to provide more debug
information if we ask for it?

Thank you. Yes, I can test patches. But right now, I was only able to
see this once, so I am not sure how to reproduce it.

This print is in the code path executed from user-space only. Seem
like some user space app have had connection opened before suspend and tried to write after resume, but driver closed all connections on
suspend. This is normal flow; user space should reopen handle and
retry in this case.

Interesting. Would user space program could this be?

The print can be demoted to debug, I think.

Understood. Still maybe it could be extended too, so the cause/solution could be deduced from the Linux logs.


Kind regards,

Paul


PS: Only if you care:

--
Alexander (Sasha) Usyskin

Your signature delimiter misses a trailing space at the end [1].


[1]: https://en.wikipedia.org/wiki/Signature_block#Standard_delimiter