Re: [PATCH] mm: memcontrol: asynchronous reclaim for memory.high

From: Chris Down
Date: Wed Feb 19 2020 - 14:17:12 EST

Michal Hocko writes:
On Wed 19-02-20 13:12:19, Johannes Weiner wrote:
We have received regression reports from users whose workloads moved
into containers and subsequently encountered new latencies. For some
users these were a nuisance, but for some it meant missing their SLA
response times. We tracked those delays down to cgroup limits, which
inject direct reclaim stalls into the workload where previously all
reclaim was handled my kswapd.

I am curious why is this unexpected when the high limit is explicitly
documented as a throttling mechanism.

Throttling is what one expects if one's workload does not respect the high threshold (ie. if it's not possible to reclaim the requisite number of pages), but you don't expect throttling just because you're brushing up against the threshold, because we only reclaim at certain points (eg. return to usermode).

That is, the workload may be well-behaved and it's just we didn't get around to completing reclaim yet. In that case, throttling the workload when there's no evidence it's misbehaving seems unduly harsh, hence the ~4% grace, with exponential penalties as there's more evidence the workload is pathological.

So sure, memory.high is a throttling mechanism if you *exceed* the stated bounds of your allocation, but this is about even those applications which are well-behaved, and just brushing against the bounds of it, as is expected on a system where the bottleneck is at the cgroup rather than being global.