Re: [char-misc-next] mei: use kvmalloc for read buffer

From: Brian Geffon
Date: Mon Oct 14 2024 - 15:11:56 EST


On Mon, Oct 14, 2024 at 02:43:31PM +0000, Usyskin, Alexander wrote:
>
> > -----Original Message-----
> > From: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > Sent: Monday, October 14, 2024 4:25 PM
> > To: Usyskin, Alexander <alexander.usyskin@xxxxxxxxx>
> > Cc: Weil, Oren jer <oren.jer.weil@xxxxxxxxx>; Tomas Winkler
> > <tomasw@xxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx
> > Subject: Re: [char-misc-next] mei: use kvmalloc for read buffer
> >
> > On Mon, Oct 14, 2024 at 01:15:49PM +0000, Usyskin, Alexander wrote:
> > > > -----Original Message-----
> > > > From: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > > > Sent: Sunday, October 13, 2024 6:08 PM
> > > > To: Usyskin, Alexander <alexander.usyskin@xxxxxxxxx>
> > > > Cc: Weil, Oren jer <oren.jer.weil@xxxxxxxxx>; Tomas Winkler
> > > > <tomasw@xxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx
> > > > Subject: Re: [char-misc-next] mei: use kvmalloc for read buffer
> > > >
> > > > On Sun, Oct 13, 2024 at 02:22:27PM +0000, Usyskin, Alexander wrote:
> > > > > > -----Original Message-----
> > > > > > From: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > > > > > Sent: Sunday, October 13, 2024 3:14 PM
> > > > > > To: Usyskin, Alexander <alexander.usyskin@xxxxxxxxx>
> > > > > > Cc: Weil, Oren jer <oren.jer.weil@xxxxxxxxx>; Tomas Winkler
> > > > > > <tomasw@xxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx
> > > > > > Subject: Re: [char-misc-next] mei: use kvmalloc for read buffer
> > > > > >
> > > > > > On Sun, Oct 13, 2024 at 02:53:14PM +0300, Alexander Usyskin wrote:
> > > > > > > Read buffer is allocated according to max message size,
> > > > > > > reported by the firmware and may reach 64K in systems
> > > > > > > with pxp client.
> > > > > > > Contiguous 64k allocation may fail under memory pressure.
> > > > > > > Read buffer is used as in-driver message storage and
> > > > > > > not required to be contiguous.
> > > > > > > Use kvmalloc to allow kernel to allocate non-contiguous
> > > > > > > memory in this case.
> > > > > > >
> > > > > > > Signed-off-by: Alexander Usyskin <alexander.usyskin@xxxxxxxxx>

Tested-by: Brian Geffon <bgeffon@xxxxxxxxxx>

> > > > > > > ---
> > > > > > > drivers/misc/mei/client.c | 4 ++--
> > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > What about this thread:
> > > > > > https://lore.kernel.org/all/20240813084542.2921300-1-
> > > > > > rohiagar@xxxxxxxxxxxx/
> > >
> > > [1] https://lore.kernel.org/all/20240813084542.2921300-1-
> > rohiagar@xxxxxxxxxxxx/
> >
> > Yes, it's a problem, I don't understand.
> >
> > > > > >
> > > > > > No attribution for the reporter? Does it solve their problem?
> > > > > >
> > > > > This patch is a result from non-public bug report on ChromeOS.
> > > >
> > > > Then make that bug report public as it was discussed in public already :)
> > > >
> > > Unfortunately, it is not my call.
> > > For now, I'll anchor this on [1]
> > >
> > > > > > Also, where is this memory pressure coming from, what is the root
> > cause
> > > > > > and what commit does this fix? Stable backports needed? Anything
> > else?
> > > > > >
> > > > > The ChromeOS is extremely short on memory by design and can trigger
> > > > > this situation very easily.
> > > >
> > > > So normal allocations are failing? That feels wrong, what caused this?
> > >
> > > 64K is order 4 allocation and may fail according to [1].
> >
> > And what changed to cause this to suddenly be 64k? And why can't we
> > allocate 64k at this point in time now?
> >
> > > > > I do not think that this patch fixes any commit - the problematic code
> > exists
> > > > > from the earliest versions of this driver.
> > > > > As this problem reproduced only on ChromeOS I believe that no need
> > > > > in wide backport, the ChromeOS can cherry-pick the patch.
> > > > > From your experience, is this the right strategy?
> > > >
> > > > No.
> > >
> > > Sure, I'll use
> > > Fixes: 3030dc056459 ("mei: add wrapper for queuing control commands.")
> > > where the first time such buffer allocated and add stable here in v2.
> >
> > So the problem has been there for years? Why is it just now showing up?
> >
>
> I suppose it is the combination of some fairly new FW that requests 64K buffer
> for content-protection case, underpowered ChromeBook and ChromeOS running
> content-protection flow.
> All three conditions should be met to trigger this failure.

That's correct we've seen this on kernels as old as 5.4. I have
personally reproduced this issue and can confirm that vmalloc does fix
it.

>
> > thanks,
> >
> > greg k-h
>
> - -
> Thanks,
> Sasha
>
>