Re: [regression -next0117] What is kcompactd and why is he eating 100% of my cpu?

From: Mel Gorman
Date: Wed Jan 30 2019 - 05:40:26 EST


On Tue, Jan 29, 2019 at 11:29:37PM -0500, valdis.kletnieks@xxxxxx wrote:
> On Tue, 29 Jan 2019 20:06:39 -0500, valdis.kletnieks@xxxxxx said:
> > On Mon, 28 Jan 2019 10:16:27 +0100, Jan Kara said:
> >
> > > So my buffer_migrate_page_norefs() is certainly buggy in its current
> > > incarnation (as a result block device page cache is not migratable at all).
> > > I've sent Andrew a patch over week ago but so far it got ignored. The patch
> > > is attached, can you give it a try whether it changes something for you?
> > > Thanks!
> >
> > Been running with the patch for about 24 hours, haven't seen kcompactd
> > misbehave. I even fired up a Chrome with a lot of tabs open, a Firefox, and a
> > kernel build, intentionally drove the system into swapping, and kcompactd
> > didn't make it into the top 10 on 'top'.
> >
> > I'm willing to say put a "tested-by:" on that one, it looks fixed from here.
> > If there's any remaining bugs, they're ones I can't seem to trigger...
>
> Spoke too soon. Sitting here not stressing the laptop at all, plenty of free
> memory, and ka-blam.
>
> Will keep my eyes open and do the data gathering Mel Gorban wanted - I discovered
> too late that trace-cmd wasn't installed, and things broke free by themselves (probably
> not coincidence that I launched a terminal window and then it cleared....)
>

That's unfortunate. I also note that linux-next still has not been
updated with the latest version of the compaction series. Nevertheless,
it might be helpful to get the output of

grep -r . /sys/kernel/mm/transparent_hugepage/*

and the trace when the system is in normal use but kcompactd has not
pegged at 100%. At minimum, I'd like to see what the sources of high-order
allocations are and the likely causes of wakeups of kcompactd in case
there are any hints there. Your Kconfig is also potentially useful.

Thanks.

--
Mel Gorman
SUSE Labs