Re: [STABLE PATCH] slub: make ->cpu_partial unsigned int

From: Matthew Wilcox
Date: Sun Sep 30 2018 - 09:31:13 EST


On Sun, Sep 30, 2018 at 06:10:26AM -0700, Greg KH wrote:
> On Sun, Sep 30, 2018 at 05:50:38AM -0700, Matthew Wilcox wrote:
> > On Sun, Sep 30, 2018 at 06:28:21PM +0800, zhong jiang wrote:
> > > From: Alexey Dobriyan <adobriyan@xxxxxxxxx>
> > >
> > > [ Upstream commit e5d9998f3e09359b372a037a6ac55ba235d95d57 ]
> > >
> > > /*
> > > * cpu_partial determined the maximum number of objects
> > > * kept in the per cpu partial lists of a processor.
> > > */
> > >
> > > Can't be negative.
> > >
> > > I hit a real issue that it will result in a large number of memory leak.
> > > Becuase Freeing slabs are in interrupt context. So it can trigger this issue.
> > > put_cpu_partial can be interrupted more than once.
> > > due to a union struct of lru and pobjects in struct page, when other core handles
> > > page->lru list, for eaxmple, remove_partial in freeing slab code flow, It will
> > > result in pobjects being a negative value(0xdead0000). Therefore, a large number
> > > of slabs will be added to per_cpu partial list.
> > >
> > > I had posted the issue to community before. The detailed issue description is as follows.
> > >
> > > https://www.spinics.net/lists/kernel/msg2870979.html
> > >
> > > After applying the patch, The issue is fixed. So the patch is a effective bugfix.
> > > It should go into stable.
> > >
> > > Link: http://lkml.kernel.org/r/20180305200730.15812-15-adobriyan@xxxxxxxxx
> > > Signed-off-by: Alexey Dobriyan <adobriyan@xxxxxxxxx>
> > > Acked-by: Christoph Lameter <cl@xxxxxxxxx>
> >
> > Hang on. Christoph acked the _original_ patch going into upstream.
> > When he reviewed this patch for _stable_ last week, he asked for more
> > investigation. Including this patch in stable is misleading.
>
> But the original patch has been in upstream for a long time now (it went
> into 4.17-rc1). If there was a real problem here, whouldn't it have
> been resolved already?
>
> And the patch in mainline has Christoph's ack...

I'm not saying there's a problem with the patch. It's that the rationale
for backporting doesn't make any damned sense. There's something going
on that nobody understands. This patch is probably masking an underlying
problem that will pop back up and bite us again someday.