Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

From: Kent Overstreet
Date: Thu Dec 01 2016 - 08:50:23 EST


On Wed, Nov 30, 2016 at 03:30:11PM -0500, Tejun Heo wrote:
> Hello,
>
> On Wed, Nov 30, 2016 at 10:14:50AM -0800, Linus Torvalds wrote:
> > Tejun/Kent - any way to just limit the workqueue depth for bcache?
> > Because that really isn't helping, and things *will* time out and
> > cause those problems when you have hundreds of IO's queued on a disk
> > that likely as a write iops around ~100..
>
> Yeah, easily. I'm assuming it's gonna be the bcache_wq allocated in
> from bcache_init(). It's currently using 0 as @max_active and it can
> set to be any arbitrary number. It'd be a very crude way to control
> what looks like a buffer bloat with IOs tho. We can make it a bit
> more granular by splitting workqueues per bcache instance / purpose
> but for the long term the right solution seems to be hooking into
> writeback throttling mechanism that block layer just grew recently.

Agreed that the writeback code is the right place to do it. Within bcache we
can't really do anything smarter than just throw a hard limit on the number of
outstanding IOs and enforce it by blocking in generic_make_request(), and the
bcache code is the wrong place to do that - we don't know what the limit should
be there, and all the IOs look the same at that point so you'd probably still
end up with writeback starving everything else.

I could futz with the workqueue stuff, but that'd likely as not break some other
workload - I've spent enough time as it is fighting with workqueue concurrency
stuff in the past. My preference would be to just try and get Jens's stuff in.

That said, I'm not sure how I feel about Jens's exact approach... it seems to me
that this can really just live within the writeback code, I don't know why it
should involve the block layer at all. plus, if I understand correctly his code
has the effect of blocking in generic_make_request() to throttle, which means
due to the way the writeback code is structured we'll be blocking with page
locks held. I did my own thing in bcachefs, same idea but throttling in
writepages... it's dumb and simple but it's worked exceedingly well, as far as
actual usability and responsiveness:

https://evilpiepirate.org/git/linux-bcache.git/tree/drivers/md/bcache/fs-io.c?h=bcache-dev&id=acf766b2dd33b076fdce66c86363a3e26a9b70cf#n1002

that said - any kind of throttling for writeback will be a million times better
than the current situation...